February 21, 2012, 2:27 PM — The focus on big data is amusing to many veterans of data storage systems.
While big data systems offer flexibilities of scale at far more affordable prices then ever before, long-time data experts are also quick to point out that these newfangled big data systems are making use of techniques and processes that were once part of the data warehousing hype that once garnered IT's attention.
Same data warehouse, then, just a new package.
This is not to say there haven't been advances. Data warehousing has moved far beyond "traditional" relational database management systems (RDBMS) on scales that really no one could call anything but "big."
For those new to this latest iteration of data warehousing, here's a look at the various tools that are prevalent in the big data marketplace today.
One aspect that makes non-relational, or NoSQL, databases unique is the independence from Structured Query Language (SQL) found in relational databases. Relational databases all use SQL as the domain-specific language for ad-hoc queries, while non-relational databases have no such standard query language, so they can use whatever they want -- including SQL. Non-relational databases also have their own APIs, designed for maximum scalability and flexibility.
NoSQL databases are typically designed to excel in one specific area: speed. To do so, they will use techniques that will seem frightening to relational database users -- such as not promising that all data is consistent within a system all of the time.
Because so much read and write activity is needed in a single relational database transaction, a relational database that could never keep up with the speed and scaling necessary to make a company like Amazon work. What Amazon does with their proprietary non-relational Dynamo database is apply an "eventually consistent" approach to their data in order to gain speed and uptime for their system when a database server somewhere goes down.
Dynamo is part of a class of non-relational databases known as distributed key-value store (DKVS) databases. DKVS is one of five classes that comprise the topology of the NoSQL landscape, each with a different architecture and approach to managing data.
DKVS databases, also known as eventually consistent key-value store databases, are specifically designed to deal with data spread out over a large number of servers. These systems use distributed hash tables for their key-value stores, and because they're distributed, the database uses peer-to-peer relationships between servers, with no "master" control. Currently most of the databases in this class are Dynamo or Dynamo-based implementations of Dynamo, such as the open source Project Voldemort, Dynomite, and KAI databases.