"It performs interactive puts and gets of individual values," Cutting explains. "But it also supports batch. It shares storage with HDFS and with every other component of the stack. And I think that's really what's led to its popularity. It's integrated into the rest of the system. It's not a separate system on the side that you need to move data in and out of. It can share other aspects of the stack: It can share availability, security, disaster recovery. There's a lot of room to permit folks to only have one copy of their data and only one installation of this technology stack."
Looking Ahead to the Hadoop Holy Grail
But if Hadoop is not defined by batch, if it is going to be a more general data processing platform, what will it look like and how will it get there?
"I think there are a number of things we'd like to see in the sort of "Holy Grail" big data system," Cutting says. "Of course we want it to be open source, running on commodity hardware. We also want to see linear scaling: If you need to store ten times the data, you'd like to just buy ten times the hardware and have that work automatically, no matter how big your dataset gets.
Similarly with performance, Cutting says, for both batch performance if you need greater batch throughput or short, smaller batch latency, you'd like to increase the amount of hardware. As for interactive queries, the same thing holds. Increased hardware should give you linear scalability in both performance and magnitude of data process."
"There are other things we'd like to see," he adds. "We'd like to see complex transactions, joins, a lot of technologies which this platform has lacked. I think, classically, folks have believed that they weren't ever going to be present in this platform, that when you adopted a big data platform, you were giving up certain things. I don't think that's the case. I think there's very little that we're going to have to need to give up in the long term."
Google Provided a Map
The reason, Cutting says, is that Google has shown the way to establish these elements in the Hadoop stack.