How much would you pay?
There's no getting around the fact that Hadoop's most-touted benefit is probably its lower cost. Because the entire framework is open source under the Apache Software License, there are no licensing fees for the base software.
Cloudera, a commercial Hadoop vendor that employs Doug Cutting, one of the framework's inventors, uses an open core model, so the base Hadoop software is free but Cloudera's extensions have a license fee. Hortonworks, which Murthy co-founded with several members of Yahoo's Hadoop team in early 2011, keeps all of the software free and open source, and builds revenue through training and support programs.
A source of additional savings: unlike an RDBMS, Hadoop does not require expensive hardware or high-end processors. Any commodity server hooked into the Hadoop network will do. That means that a Hadoop node only needs a processor, a network card, and a few hard drives, and will cost around $4,000; an RDBMS system might cost $10,000 to $14,000 per terabyte. Such a massive difference definitely explains why Hadoop is getting such strong attention, perhaps deservedly so.
Care must be taken, however, that all of those saved dollar signs don't create a siren effect and get businesses to rush willy-nilly into a Hadoop migration plan. As I mentioned in Part 1, the type of experience Hadoop system engineers and administrators need means that companies interested in building their own Hadoop deployment is likely to end up paying a big premium in personnel costs -- whether the company deploys a commercial or free version of Hadoop. In fact, the market for qualified Hadoop engineers has gotten so hot that two of the biggest Hadoop players -- Google and Facebook -- have gotten into multi-million dollar bidding wars over qualified Hadoop engineers. No matter what your deployed software is, expect to pay big bucks for qualified Hadoop staff. Depending on your needs and location, that could be anywhere from $120 to $190K annually (not counting any stock and perks you may need to sweeten the deal). But is this enough to offset the savings in hardware and software?
Breaking down the deployment of a completely free software deployment of Hadoop, then, presuming 100 $4,000 nodes amortized over three years and an engineer paid $150,000 in salary, you get something like this:
Hourly hardware cost (over three years): $15.21
Hourly maintenance cost: $17.11
That comes out to an operational cost of about $32 per hour for the entire system, or about $283,320 total annually (excluding power).
Now, assuming a similarly sized RDBMS system, in 2008 Oracle was pricing out a database machine with 168 TB of storage at $650,000 for the hardware and $1.68 million for the software, which puts this system right at the top of the $14,000/TB range. Presuming an annual Oracle database administrator's salary of $95,000, the operation costs break down to:
Hourly hardware cost (over three years): $88.60
Hourly maintenance cost: $10.27
Even with a reduction of the salary for an Oracle administrator versus the premium salary of a Hadoop engineer, you're talking an operational cost for such an Oracle system being $98.87 an hour, or $866,694 annually. That's a big difference, over three times the cost of a similar-sized Hadoop deployment.
Assuming the lower end of the RDBMS cost scale ($10,000/TB) doesn't improve things that much. Plugging that number in gets you an annual cost of $644,827 -- still 1.8 times the cost of a Hadoop distributed system.
These are operational costs, of course, and they don't factor in the migration costs, nor any costs for ongoing Hadoop support should you decide to use an outside vendor. But the dramatic difference in costs means that even with a paying a Hadoop admin a premium, companies will still save a big chunk of change ion the long run.
Moving down the road
With lower hardware costs and such strong business advantages for any size organization that wants to get the most out of their data, Hadoop's benefits are attracting a lot of attention in the enterprise and SMB spaces.
In Part 3 of this series, we'll examine the techniques and costs involved in moving to Hadoop from existing RDBMS, look at how companies are testing Hadoop, and learn about tools you can use to analyze Hadoop data faster and more cheaply than any RDBMS.
This article, "Hadoop vs. an RDBMS: How much (less) would you pay?," was originally published at ITworld. For the latest IT news, analysis and how-tos, follow ITworld on Twitter, Facebook, and Google+.