The performance of MySQL Cluster was under 10 ops/sec with a latency of 400 ms. It is partitioned over the nodes in the cluster, so the system uses an optimizer to translate SQL commands into a query plan. The execution of this plan is divided among multiple nodes. For range scans, a B-tree index is used to make column comparisons in such expressions as >, <, or BETWEEN.
Sharded MySQL is based on key hashing on the connector side and does not support true range scans over a cluster. While a single shard did about 10 ops/sec, the whole sharded setup showed near 40 ops/sec with a latency of up to 400 ms. MyISAM caches index blocks but not data blocks. There can be an overhead due to re-reading data blocks from the OS buffer cache.
The Riak bitcask storage engine does not support range scans. This can be done through secondary indexes with eleveldb and special $key index referring to the primary key. Eleveldb showed insufficient performance that started to degrade after 50,000,000 records had been imported and we fell back to bitcask.
* Workload G: Insert-mostly mode. Settings for the workload: 1) Insert/Read: 90/10 2) Latest request distribution
HBase showed the best results under a workload that included large volumes of writes. Cassandra was second. The NDB engine of MySQL Cluster also managed intensive writes perfectly well.
As you can see, there is no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks.
For example, a database can demonstrate excellent performance, but once the amount of records exceeds a certain limit, the speed falls dramatically. It means that this particular solution can be good for moderate data loads and extremely fast computations, but it would not be suitable for jobs that require a lot of reads and writes. In addition, database performance also depends on the capacity of your hardware.
It was hardly possible to include all of the performance diagrams and describe everything in one article. You can download the full version of the research that contains separate chapters dedicated to every database, YCSB and Amazon EC2 configuration details, and appendix with other performance diagrams at http://altoros.com/nosql-research.
We hope this research will be useful to both developers working with NoSQL solutions and customers trying to choose a database. Altoros's R&D team will regularly revise and update information of this research to cover new databases and releases of the most popular products.
About the author: Sergey Bushik is a senior R&D engineer at Altoros. He has more than seven years of experience in implementation of Java-based projects that include big data processing, data mining and Hadoop computations. Sergey has a number of certificates in Java and is a Sun Certified Enterprise Architect for the Java Platform. He is a regular speaker at international conferences -- most recently, he delivered sessions at Big Data Meetup (Sunnyvale, Calif.), GOTO Copenhagen 2012, Hadoop Evening (Eastern Europe), etc.
Read more about software in Network World's Software section.
This story, "A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak" was originally published by Network World.