Exascale unlikely before 2020 due to budget woes

Prototype systems still eyed for 2018, but only if Congress approves billions in funding, say U.S. DOE officials

By , Computerworld |  Hardware, Department of Energy, high performance computing

To give some perspective on that goal, researchers at the Argonne National Lab developed a multi-petaflop simulation of the universe. Salman Habib, a physicist at the lab, said the simulation achieved 13.94 petaflops sustained on more than 1.5 million cores, with a total concurrency of 6.3 million at 4 threads per core on IBM's Sequoia system.

The project is the largest cosmological simulation to date.

"Much as we would all like to, we can't build our own universes to test various ideas about what is happening in the one real universe. Because of this inability to carry out true cosmological experiments, we run virtual experiments inside the computer and then compare the results against observations -- in this sense, large-scale computing is absolutely necessary for cosmology," said Habib.

To accomplish the task, researchers must run hundreds or thousands of virtual universes to tune their understanding. "To carry out such simulation campaigns at high fidelity requires computer power at the exascale" said Habib. "What is exciting is that by the time this power will be available, the observations and the simulations will also be keeping pace."

The total number of nodes in an exascale system will likely be in the 100,000 range, like the smaller systems today. Now, though, each node is becoming more parallel and powerful, said Pete Beckman, the director of the Exascale Technology and Computing Institute at Argonne National Laboratory.

The IBM Blue Gene/Q, for instance, has 16 cores with 64 threads. As time goes on, the number of threads will increase from the hundreds to upwards of a thousand.

"Now, when you have 1,000 independent threads of operation on a node, then the whole system ends up with billion-way concurrency," said Beckman.

"The real change is programming in the node and the parallelism to hide latency, to hide the communication to the other nodes, so that requires lots of parallelism and concurrency," said Beckman.

The new systems will require adaptive programming models, said Beckman. Until an approach is settled it is going to be a "disruptive few years in terms of programming models."

Vendors will have to change their approaches to building software, said Harrod.

"Almost all the vendors have 50 years of legacy built into their system software - 50 years of effort where nobody ever cared about energy efficiency, reliability, minimizing data movement - that's not there, so therefore we need to change that," said Harrod.

Harrod believes the problems can be solved, but that the U.S. will have to invest in new technologies. "We have to push the vendors to go where they are not really interested in going," he said.


Originally published on Computerworld |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Spotlight on ...
Online Training

    Upgrade your skills and earn higher pay

    Readers to share their best tips for maximizing training dollars and getting the most out self-directed learning. Here’s what they said.

     

    Learn more

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness