January 13, 2012, 11:00 AM —
If your organization seems to be a good fit for Hadoop, you can download the open source software that comprises the data framework and try it out with relative ease.
So far in this series, you've learned some of what it takes to be ready to administer Hadoop, and seen the benefits and drawbacks to using it. In this final installment, we'll examine the techniques and costs involved in moving to Hadoop from an existing RDBMS, see how companies are deploying Hadoop, and learn about tools you can use to analyze Hadoop data faster and more cheaply than any RDBMS.
Like many up-and-coming technologies, particularly those in the open source world, Hadoop has enjoyed the benefits of the do-it-yourself spirit of IT shops that want to take it out for a spin. Now that Hadoop is getting a lot of attention in technology media and conferences, so C-level executives are also getting into the Hadoop act, wanting to see just how much money Hadoop can save their companies. Two separate vectors of adoption -- from the trenches and from the bosses -- are common enough to warrant a closer look.
Bottom up: The shadow knows
Shadow IT is either a blessing or a curse to an organization. Many's the time when an experimental or sandbox configuration has ended up paying off in big ways for the organization's bottom line. Linux, for instance, was one such beneficiary of shadow IT at the turn of the century.
Today, it's sometimes Hadoop's turn in the shadows, according to Arun Murthy, VP, Apache Hadoop at the Apache Software Foundation.
"In the bottom-up method of deployment, usually there's a couple of engineers who download and deploy Hadoop either on a single node or maybe a small cluster with four or five nodes," Murthy explained.
What tends to happen next is a pattern that Murthy has seen many times. Staffers using the Hadoop cluster start to notice the value of the toolset. Perhaps other divisions of the company set up their own Hadoop clusters. Eventually, the value of Hadoop rises significantly and (thanks to the scalability of the underlying distributed filesystem), the separate Hadoop clusters are combined into a single large cluster with perhaps 50 or so nodes.
According to Murthy, this is exactly what happened when Yahoo and Facebook first adopted Hadoop. Once the value of Hadoop for all of the separate teams and applications became clear, it became obvious that combining everything into one large Hadoop network would be ideal.
Of course, not many companies will need to scale up to the ten- or fifty-thousand-node systems deployed by Facebook and Yahoo, respectively, but the general principle is still the same.