June 19, 2012, 10:18 AM — Apache's Hadoop technologies are becoming critical in helping enterprises manage vast amounts of data, with users ranging from NASA to Twitter to Netflix increasing their reliance on the open source distributed computing platform.
Hadoop has gathered momentum as a mechanism for dealing with the concept of big data, in which enterprises seek to derive value from the rapidly growing amounts of data in their computer systems. Recognizing Hadoop's potential, users are both using the existing Hadoop platform technologies and developing their own technologies to complement the Hadoop stack.
[ Facebook has tackled Hadoop's "Achilles' heel": the reliance on a single name server to coordinate operations. | Get up to speed on big data with InfoWorld's primer. | Subscribe to InfoWorld's Data Explosion newsletter for the best practices in managing data growth. ]
Hadoop's corporate usage now and in the futureNASA expects Hadoop to handle large data loads in projects such as its Square Kilometer Array sky-imaging effort, which will churn out 700TBps when built in the next decade. The data systems will include Hadoop, as well as technologies such as Apache OODT (Object Oriented Data Technology), to cope with the massive data loads, says Chris Mattmann, a senior computer scientist at NASA.
Twitter is a big user of Hadoop. "All of the relevance products [offering personalized recommendations to users] have some interaction with Hadoop," says Oscar Boykin, a Twitter data scientist. The company has been using Hadoop for about four years and has even developed Scalding, a Scala library intended to make it easy to write Hadoop MapReduce jobs; it is built on top of the Cascading Java library, which is designed to abstract away Hadoop's complexity.