At this stage of the game, big data analytics is really about discovery. Running iterations to see correlations between data points doesn't happen without milliseconds of latency, multiplied by millions (or billions) of iterations. Working in memory is at three orders of magnitude faster than going to disk, Barth says. "Speed matters in this business."
Ever wonder how Facebook can tag you in a photo as soon as it goes live on the site? A photo is a big file, and Facebook has Exabytes of photos on file. Facebook runs an algorithm against every photo to finds faces and reduces those faces to a few data points, says Revolution's Smith. This reduces a 40 MB photo down to about 40 bytes of data. The data then goes into a "black box," which determines whose face it is, tags it, searches for that person's account and all the accounts associated with person, and sends everyone a message.
That's big data at work. But it's also how in-memory analytics makes big data work. Currently, most people don't put more than 100 MB into an in-memory cache at any one time because of Java's limitations. The more data that's put into memory, Nakamura says, the more you have to tune the Java virtual machine. "It gets slower, not faster, and that is problematic when you are a performance-at-scale play." (Terracotta's Big Memory product line gets around this issue.)
For now, in-memory analytics is well-suited to high-frequency, low-computation number crunching. Of course, when you have Terabytes of data available to run real-time analytics, that behavior will change. In this case, the technology needs to catch up to the need, not the other way around. The need exists, the data exists and, based on the number of announcements coming from Hadoop World in October, the technology is on its way. No chicken and egg here.