A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak

By Sergey Bushik, senior R&D engineer at Altoros Systems Inc., Network World |  Software, Cassandra, databases

HBase performed better than Cassandra in range scans. HBase scanning is a form of hierarchical fast merge-sort operation performed by HRegionScanner. It merges the results received from HStoreScanners (one per family), which, in their turn, merge the results received from HStoreFileScanners (one for each file in the family). If caching is turned on, the server will simply provide the number of specified records instead of bouncing back to the HRegionServer to process every record.

Cassandra's scan performance with a random partitioner has improved considerably compared to Version 0.6, where this feature was initially introduced.

Cassandra's SSTable is a sorted strings table that can be described as a file of key-value string pairs sorted by keys and key-value pairs written in a particular order. To achieve maximum performance during range scans, we had to use an order-preserving partitioner. Scanning over ranges of order-preserving rows is super-fast. It is similar to moving a cursor through a continuous index. However, the database cannot distribute individual keys and corresponding rows over the cluster evenly, thus a random partitioner is used to ensure even data distribution. This is the default partitioning strategy in Cassandra. Random partitioning ensures good load balancing and provides some additional speed in range scans with an order preserving partitioner.

In MongoDB 2.5, the table scan triggered by the { "_id":{"$gte": startKey}} query showed a maximum throughput of 20 ops/sec with a latency of H 1 sec.

The performance of MySQL Cluster was under 10 ops/sec with a latency of 400 ms. It is partitioned over the nodes in the cluster, so the system uses an optimizer to translate SQL commands into a query plan. The execution of this plan is divided among multiple nodes. For range scans, a B-tree index is used to make column comparisons in such expressions as >, <, or BETWEEN.

Sharded MySQL is based on key hashing on the connector side and does not support true range scans over a cluster. While a single shard did about 10 ops/sec, the whole sharded setup showed near 40 ops/sec with a latency of up to 400 ms. MyISAM caches index blocks but not data blocks. There can be an overhead due to re-reading data blocks from the OS buffer cache.

The Riak bitcask storage engine does not support range scans. This can be done through secondary indexes with eleveldb and special $key index referring to the primary key. Eleveldb showed insufficient performance that started to degrade after 50,000,000 records had been imported and we fell back to bitcask.

* Workload G: Insert-mostly mode. Settings for the workload: 1) Insert/Read: 90/10 2) Latest request distribution


Originally published on Network World |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness