Supercomputers face growing resilience problems

As the number of components in large supercomputers grows, so does the possibility of component failure

By , IDG News Service |  Hardware

The researchers found that 70 percent of correlated failures provide a window of opportunity of more than 10 seconds. In other words, when the first sign of a failure has been detected, the system may have up to 10 seconds to save its work, or move the work to another node, before a more critical failure occurs. "Failure prediction can be merged with other fault-tolerance techniques," Gainaru said.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Ask a Question