March 01, 2010, 12:26 PM —
Something interesting crossed the wires today... Twitter, the popular microblogging service, is switching to the Cassandra data management system from MySQL.
Like MySQL, Cassandra is open source, created as the Apache Software Foundation project sponsored by Facebook. Facebook obviously uses Cassandra, as does Digg, which announced its switch to Cassandra in September last year.
Why so popular in the social media applications set? It seems that MySQL seems to have issues scaling to the needs of the these high-traffic, high-storage sites. For Digg, "... the case of the traditional architecture, the lack of redundancy on the write masters is painful, and both approaches have significant management overhead to keep running," wrote Ian Eure back in September. The rest of Eure's blog post is extremely illuminating. He goes into a detail explanation of the challenges of using a relational database like MySQL, but this passage really sums it up well:
"The fundamental problem is endemic to the relational database mindset, which places the burden of computation on reads rather than writes. This is completely wrong for large-scale web applications, where response time is critical. It’s made much worse by the serial nature of most applications. Each component of the page blocks on reads from the data store, as well as the completion of the operations that come before it."
That reasoning seems to be the same for Twitter's move to Cassandra this week. In fact, when Ryan King, a Twitter software engineer, was interviewed Feb. 23 about a potential move to Cassandra, he laid the solutions out exactly like Eure did in September: either Twitter would have to move to a "more Automated sharded MySQL setup" or switch to one of a new class of non-relational databases that have much more appeal for web application hosts. These include HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, Tokyo Cabinet/Tyrant, and Dynomite.
Euphemistically, this set of databases is referred to as the "NoSQL" family of databases, in that they all shift away from the relational database model that all of the SQL-class of databases use.
With its flexible architecture that scales up, and its clustering capabilities, Cassandra is well-suited for the kind of work needed on web applications. So, are MySQL and the other SQL databases in trouble? Not likely... there are still some things the non-relational databases can't do very well yet. Data handling is not as rigorous, since NoSQL databases tend to sacrifice long-term data detail for short-term performance.
I would keep an eye on this new family of databases... if the performance and data issues can be better tuned, the NoSQL movement may represent a new future for data management--especially since the Web seems to be the most popular platform for apps.