How is Hadoop able to provide such high capacity at lower costs than other relational database deployments?


What is it that is so different about Hadoop that allows it to keep cost per terabyte of hard disk capacity so much cheaper than other NoSQL processing platforms?  One figure I just saw was Hadoop can support 4TB of hard disk capacity per node, with each node costing around $4000, while other choices were averaging from $10-12,000 per terrabyte.  This is such a huge difference, and I'm wondering exactly why this is.

Tags: Hadoop, NoSQL
Answer this Question


2 total
Vote Up (22)

The Hadoop Distributed File System, or HDFS, is one key to decreased cost, as well as the Hive distributed data warehouse.  Most people know that Hadoop is very scalable, but what this allows is distribution of big data processing tasks across many nodes built on cheap x86 servers.  Each node costs about 4 grand because of the low cost of those servers, whereas most relational database deployments run at around $10,000 or $12,000 per terabyte, as you noted.   

Vote Up (16)

Hi JOiseau,

Here's a brief blurb on why Hadoop itself is so cheap, compared to other software solutions.

Why the Hoopla over Hadoop?

"Hadoop is cheap for two reasons. First, it's open source, which means no proprietary licensing fees. Hadoop is an Apache Software Foundation project, created by Doug Cutting and Mike Cafarella, who implemented an idea described in papers by Google. Apache maintains a list of Hadoop distributions, and IBM has announced plans to offer its own distribution.

Second, Hadoop doesn't require special processors or expensive hardware. As Cutting explained, you just need “something with an Intel processor, a networking card, and usually four or so hard drives in it.”

Ask a question

Join Now or Sign In to ask a question.
Continuing its efforts to bring business intelligence to the masses, software provider Qlik has released Qlik Sense, which is designed to provide business managers with an easy way to examine large data sets for insights and trends.
SAP has made a series of updates to its InfiniteInsight predictive modeling software and Lumira data-visualization tool in a bid to shore up its foothold in the analytics market.
Key performance indicators can be studied at the most prestigious business school. Or you could craft one yourself studying the trash.
Add Tibco to the list of vendors pushing a full stack of so-called "customer engagement" software, which companies use to track and analyze consumer behavior in hopes of building deeper relationships with them and ultimately, selling more products and services.
Adatao is another startup promising easier data analytics for the masses. It stands out in a few ways.
Teradata has bought the assets of Revelytix and Hadapt in a bid to grow out its capabilities for the Hadoop big-data processing framework.
Software provider Actuate is offering a free way for business units to analyze enterprise data and present the results in a format that is easy to understand. is rollling out enhancements to its Salesforce1 mobile application, with new reporting and dashboard capabilities that give users a way to dig deeper and more broadly into CRM data.
Text analytics company Luminoso, a 2010 MIT Media Lab spinoff that helps its customers make sense out of unstructured data, has raised a $6.5 million Series A round of funding. The 25-person outfit plans to use the funds for new hires in sales, product management and client services as well as to expand its product line.
Microsoft will soon offer a service aimed at making machine-learning technology more widely usable.
Join us: