Last week's Supercomputing 13 (SC13) show had its fair share of big news, but perhaps the biggest came from Intel, which announced its future Xeon Phi, codenamed "Knights Landing," would not be a co-processor any more, it would be its own CPU and accelerator all in one.
The Xeon Phi was Intel's attempt to compete in the GPU market, trying to close a decade or more gap between it and competitors Nvdia and AMD/ATI. Originally known as Larrabee, the project had become a white elephant and wasted a whole lot of time and money before getting on track as a co-processor that worked in conjunction with Xeon servers.
On the November 2013 list, 13 of the top 500 supercomputers are co-powered by Xeon Phi cards, including the top supercomputer on the list, China's Tianhe-2, with a peak of 54.9 teraflops of performance. Nvidia still dominates, with 38 machines in the top 500 and AMD, which has not aggressively pursued supercomputing, has two machines.
But now, with Intel's Knights Landing, there won't be a need to build servers with Xeon processors and Xeon Phi co-processor cards. The Knight's Landing generation will be its own processor, so there will be no more need to cram the Xeon Phi cards into the server box. Those cards were the size of a high-end GPU, which meant a lot of hardware jammed into the box and a lot of heat.
Much more important is what else it takes away. Knight's Landing will erasing the memory buffer and PCI Express bus that sat between the CPU and main memory and the coprocessor chip and frame buffer memory in the Xeon Phi card. Now that applications run entirely natively instead of offloading the data sets to the coprocessor, all of that latency goes away.
Now you will have both scalar processor cores and vector processor cores on the same chip sharing access to unified memory. This is huge. A fair amount of time has to go into offloading data sets from main memory to the frame buffer memory of the co-processor and then back to the CPU and main memory. It's why Nvidia had to come out with the CUDA language, because plain old C++ or Java wouldn't work.
So even if all the hardware speeds and feeds remained the same, with the removal of the busses, we would see a huge gain in application performance simply because data sets no longer have to be shuttled between two memory sets across a bus. Combine that with the promise that Knight's landing will triple performance over the current generation, and you are talking major gains in supercomputing performance.
The others aren't sitting still. AMD has its Heterogeneous System Architecture (HSA), which will continue the integration begun with Fusion, and Nvidia has a mysterious project called Project Denver, which will involve integration of its own 64-bit ARM processor with GPU technology sharing common memory.
None of this will happen tomorrow. Knights Landing will be released sometime in late 2014 or in early 2015. The chip will be made using 14nm process technology, will support new AVX 3.1 instructions, built-in DDR4 memory controller, on-package high-speed memory and likely other innovations.