Hadoop: An elephant born tap-dancing Nothing in open source is more exciting than the constellation of software around Hadoop. Technically, Hadoop is just a small part of a big stack of software that keeps a number of machines crunching together on a single problem, but as you may have noticed, your boss's boss has learned to drop Hadoop as a buzzword. So we often overlook related programs such as Pig or Hive though they can be more useful than Hadoop itself.
Hadoop is the poster child for the big data. It began as a small experiment based on Google's MapReduce technology and grew into a stack of code for those who need to do big things with data spread out across a rack of nodes. The tool has been so successful, we've heard rumors that the Google engineers who pioneered the MapReduce paradigm are jealous of the innovation going on in Hadoop. Google got the ball rolling, but the open source nature of Hadoop allowed the rest of the Internet to surpass the biggest dog in big data.
This is especially important to the perennial also-ran, Yahoo, which was one of the early believers in Hadoop and supported much of the original work. Open source believers point to Hadoop as a huge victory for the open source strategy: Yahoo shared with everyone else and everyone else shared back, yielding a whole new software ecosystem -- one that's driving the hottest industry trend today.
Embedded in Hadoop is the ongoing tension between open and proprietary. A swarm of startups have sprung from the open source Hadoop code, adding just enough proprietary contributions to attract and retain customers. This debate is being played out as one company alone, Hortonworks, tries to keep its entire platform open. Will Hortonworks succeed? One competitor told me archly, "It's nice to see that Hortonworks finally got a platform out."
Yet pragmatists see this tension as a creative force that fosters exciting new businesses. The core of Hadoop is still pretty close to a standard, which makes life easier for everyone. The extras keep everything running and pay for the upkeep of the core. Programmers need to eat, and the secret sauce is the best way to justify salaries, while the core remains open and improving.
From rows and relations to keys and columns Hadoop and its satellites are not the only projects working to solve large and complex data problems -- or simpler data problems for that matter. After decades of throwing every sort of data into relational database management systems, we're now seeing a slew of open source alternatives to the traditional data store.