Like most software engineers, I’ve spent a great deal of time mastering relational databases. Getting started with a distributed database (or NoSQL) can have your head spinning as you try to unlearn your relational ways and adapt to a new style of data storage and access. Here are some helpful resources to get you going and choose the right platform.
The extremely challenging effort of scaling, creating high availability, and creating disaster recovery of enterprise SQL servers is what drives most teams to a distributed solution. To understand why RDBMS systems have such a tough time scaling, a great place to start is with Brewer’s CAP Theorem
The gist of CAP Theorem is that you can have two of the following three capabilities, but not all three:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
RDBMS systems have Consistency and Availability and thus scale vertically but not horizontally.
NOSQL systems choose either Consistency and Partition tolerance, or Availability and Partition tolerance depending on the engine and the data model. These combinations allow for horizontal scaling in both hardware and storage capacity in exchange for (sometimes) eventual consistency.
A big challenge outside of the understanding of how NoSQL data modeling works is to choose the right NoSQL engine, and there are a vast amount of them. To make matters more difficult, each is really geared toward a specific set of strengths, so there is no general best solution, it’s dependent on what your application will be doing.
As some bonus reading, if you’re considering a large-scale deployment of MongoDB, do yourself a favor and read this rant/warning before you do.