Behind every great ecommerce website is a database, and in the early 2000s Amazon.com’s database was not keeping up with the company’s business.
Part of the problem was that Amazon didn’t have just one database – it relied on a series of them, each with its own responsibility. As the company headed toward becoming a $10 billion business, the number and size of its SQL databases exploded and managing them became more challenging. By the 2004 holiday shopping rush, outages became more common, caused in large part by overloaded SQL databases.
Something needed to change.
But instead of looking for a solution outside the company, Amazon developed its own database management system. It was a whole new kind of database, one that threw out the rules of traditional SQL varieties and was able to scale up and up and up. In 2007 Amazon shared its findings with the world: CTO Werner Vogels and his team released a paper titled “Dynamo – Amazon’s highly available key value store.” Some credit it with being the moment that the NoSQL database market was born.
+ MORE AT NETWORK WORLD: NoSQL takes the database market by storm +
The problem with SQL
The relational databases that have been around for decades and most commonly use the SQL programming language are ideal for organizing data in neat tables and running queries against them. Their success is undisputed: Gartner estimates the SQL database market to be $30 billion.
But in the early to mid-2000s, companies like Amazon, Yahoo and Google had data demands that SQL databases just didn’t address well. (To throw a bit of computer science at you, the CAP theorem states that it’s impossible for a distributed system, such as a big database, to have consistency, availability and fault tolerance. SQL databases prioritize consistency over speed and flexibility, which makes them great for managing core enterprise data such as financial transactions, but not other types of jobs as well.)
Take Amazon’s online shopping cart service, for example. Customers browse the ecommerce website and put something in their virtual shopping cart where it is saved and potentially purchased later. Amazon needs the data in the shopping cart to always be available to the customer; lost shopping cart data is a lost sale. But, it doesn't necessarily need every node of the database all around the world to have the most up-to-date shopping cart information for every customer. A SQL/relational system would spend enormous compute resources to make data consistent across the distributed system, instead of ensuring the information is always available and ready to be served to customers.
One of the fundamental tenets of Amazon’s Dynamo, and NoSQL databases in general, is that they sacrifice data consistency for availability. Amazon’s priority is to maintain shopping cart data and to have it served to customers very quickly. Plus, the system has to be able to scale to serve Amazon’s fast-growing demand. Dynamo solves all of these problems: It backs up data across nodes, and can handle tremendous load while maintaining fast and dependable performance.
“It was one of the first NoSQL databases,” explains Khawaja Shams, head of engineering at Amazon DynamoDB. “We traded off consistency and very rigid