A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak

By Sergey Bushik, senior R&D engineer at Altoros Systems Inc., Network World |  Software, Cassandra, databases

Sharded MySQL showed the best performance in reads. MongoDB -- accelerated by the "memory mapped files" type of cache -- was close to that result. Memory-mapped files were used for all disk I/O in MongoDB. Cassandra's key and row caching enabled very fast access to frequently requested data. With the off-heap row caching feature added in Version 0.8, it showed excellent read performance while using less per-row memory. The key cache held locations of keys in memory on a per-column family basis and defined the offset for the SSTable where the rows were stored. With a key cache, there was no need to look for the position of the row in the SSTable index file. Thanks to the row cache, we did not have to read rows from the SSTable data file. In other words, each key cache hit saved us one disk seek and each row cache hit saved two disk seeks. In HBase, random read performance was slower. However, Cassandra and HBase can provide faster data access with per-column-family compression.

* Workload B: Update. Thanks to deferred log flushing, HBase showed very high throughput with extremely small latency under heavy writes. With deferred log flush turned on, the edits were first committed to the memstore. Then the aggregated edits were flushed to HLog asynchronously. On the client side, HBase write buffer cached writes with the autoFlush option set to true, which also improved performance greatly. For security purposes, HBase confirms every write after its write-ahead log reaches a particular number of in-memory HDFS replicas. HBase's write latency with memory commit was roughly equal to the latency of data transmission over the network. Cassandra demonstrated great write throughput, since it first writes to the commit log -- using the append method, which is a pretty fast operation -- and then to a per-column-family memory store called Memtable.

* Workload C: Read-only. Settings for the workload: 1) Read/update ratio: 100/0 2) Zipfian request distribution

This read-only workload simulated a data caching system. The data was stored outside the system, while the application was only reading it. Thanks to B-tree indexes, sharded MySQL became the winner in this competition.

* Workload E: Scanning short ranges. Settings for the workload: 1) Read/update/insert ratio: 95/0/5 2) Latest request distribution 3) Max scan length: 100 records 4) Scan length distribution: uniform


Originally published on Network World |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

SoftwareWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question