For the first time since anyone can remember, a contract to build a supercomputer has gone to a chip vendor instead of a systems vendor. The Department of Energy announced a $200 million contract on Thursday for Intel, not IBM, HP, or an HPC systems vendor, to build a 180 petaFLOP supercomputer.
Intel, of course, does not make systems. That task will be assigned to Cray as a subcontractor, which has racked up its share of HPC systems in recent years. On the November 2014 Top 500 supercomputer list, Cray had 62 systems on the list, including the number two system, Titan, a 27 petaFLOP machine powered by AMD Opteron and Nvidia K20c processors.
The new machine, dubbed Aurora, will be based at the Argonne National Laboratory outside Chicago and won't be commissioned until 2018. One question that remains is what processors will be used. More and more of the top supercomputers use a GPU accelerator or Intel's Xeon Phi co-processor. Intel hasn't even laid out a roadmap that far ahead, so the processors remain a mystery.
Argonne and Intel will also provide an interim system, called Theta, to be delivered in 2016, which will be used to help ALCF users transition their applications to the new server technology. The DoE expects the supercomputer will help with research dedicated to materials science, biological science, transportation efficiency and renewable energy.
"Argonne National Laboratory's announcement of the Aurora supercomputer will advance low-carbon energy technologies and our fundamental understanding of the universe, while maintaining United States' global leadership in high performance computing," Energy Department Undersecretary for Science and Energy Lynn Orr said in a statement. "This machine – part of the Department of Energy's CORAL initiative – will put the United States one step closer to exascale computing."
CORAL is short for Collaboration of Oak Ridge, Argonne, and Lawrence Livermore. Those three facilities are the home to multiple supercomputers and the DoE has spent a total of $525 million in recent years in purchases of new systems at those three facilities. Oak Ridge, in Tennessee, has Titan and a future system, called Summit, which will run IBM Power9 processors and Nvidia Volta GPUs. It will have a theoretical peak of 150 to 300 petaFLOPs, which will make it competitive with Aurora. It's due in 2017.
Livermore, which has always run IBM hardware, is due for a new system also running Power9 processors. Sierra, also set to be installed in 2017 (since the Power9 isn't even done yet) will run at 100+ petaFLOPs.
Aurora is to be built on Cray's next-generation Shasta computer architecture, which integrates the entire system, from processor, networking and memory technologies, cabinet packaging options, power and cooling systems and a productive software ecosystem. Cray was short on details because Shasta won't ship until 2018, so it's still in the development stage.
One of the big improvements in these systems is power efficiency. For example, Titan is a 27 pFLOP system that consumes around nine megawatts of power. Its successor, Summit, will have 150 pFLOPs and consume just 10 mWatts. And Theta, the in-between machine for scientists to prepare for Aurora, will provide 8.5 pFLOPs while consuming just 1.7 megawatts of power.
The DoE went with Intel and not Cray directly because the CORAL program always intended to have two vendors, one for Oak Ridge and one for Argonne. Livermore then chose from one of the two other winners for its machine.
There are multiple reasons for choosing two vendors: diversity of hardware, as some have strengths others don't have; not putting all your eggs in one basket, and labs might have different preferences for architecture. Oak Ridge clearly likes Nvidia GPUs while Livermore prefers IBM.
Now comes the real waiting game to see just what kinds of processors IBM, Intel and Nvidia will have out in 2018.
UPDATE, 4/12/15, to clarify the DoE's bidding process.