The 64-bit CPU era has ushered in a raft of innovations and computing models that were impossible on a 32-bit system. With its 4GB memory limit, and about 1GB lost to the OS, 32-bit systems couldn't do very much and often ran at about 5% utilization.
But with 64-bit CPUs, the sky's the limit. In theory you can put 16 exabytes of RAM in a 64-bit system, which might get a little expensive. Machines with 128GB or more, though are becoming commonplace, ushering in the world of virtualization, cloud computing, in-memory databases and Big Data.
Big Data brings new challenges, however. The data sets might be too big even for a machine loaded with memory and simply don't fit into the RAM even in a cluster scenario.
A team of researchers at MIT came up with a truly radical solution: they got rid of the memory. Instead, they created a cluster called BlueDBM comprised entirely of Solid-State Drives.
"It’s about a tenth as expensive, and it consumes about a tenth as much power," said a statement from the MIT News Office. "The problem is that it’s also a tenth as fast."
The researchers noted that if the nodes in a cluster need to request data from disk as little as five percent of the time, the overall performance of the task drops to a level comparable to that of the experimental flash-storage-only cluster. So if it had to hit the swap file more than 5% of the time, the DRAM became pointless. It was no faster than the all-SSD/no-DRAM solution.
"40 servers with 10 terabytes’ worth of RAM couldn’t handle a 10.5-terabyte computation any better than 20 servers with 20 terabytes’ worth of flash memory, which would consume only a fraction as much power," the researchers wrote in their paper.
The researchers moved a little computational power off of the servers and onto the chips that control the flash drives. By preprocessing some of the data on the flash drives before passing it back to the servers, those chips can make distributed computation much more efficient. This also got rid of the overhead of running an operating system, since the preprocessing algorithms are wired into the chips.
MIT also went with FPGA processors instead of CPUs for a number of reasons. For starters, they act as a kind of storage fabric, so any server in the cluster can retrieve data from any of the connected flash storage devices. Also, the FPGAs can execute algorithms that preprocessed the data stored on the flash drives with application-specific algorithms. This further sped up processing.
Intel's obsession with FPGA is starting to really make sense now.
The group used chips from Quanta, which has a strong connection to MIT, Xilinx and Samsung. They built a cluster of 20 servers, each with an FPGA and two 500GB SSD drives per FPGA.
"This is not a replacement for DRAM [dynamic RAM] or anything like that," said Arvind Mithal, a professor of computer science at MIT, whose group performed the new work. "But there may be many applications that can take advantage of this new style of architecture. Which companies recognize: Everybody’s experimenting with different aspects of flash. We’re just trying to establish another point in the design space."
What they have done, really is get a little bit ahead of where the memory industry is going. Many firms are working on memory that has the storage capability of NAND flash but the speed of DRAM. ReRAM and MRAM are two of several memory types in this space, but they are thought to be several years off.
At the basic level of functionality, though, the MIT crew has done what ReRAM and MRAM seek to do; eliminate the boundary between storage and execution memory and process data where it resides, rather than loading it into memory.