When we started our research into NoSQL databases, we wanted to get unbiased results that would show which solution is best suitable for each particular task. That is why we decided to test performance of each database under different types of loads and let the users decide what product better suits their needs.
We started with measuring the load phase, during which 100 million records, each containing 10 fields of 100 randomly generated bytes, were imported to a four-node cluster.
HBase demonstrated by far the best writing speed. With pre-created regions and deferred log flush enabled, it reached 40K ops/sec. Cassandra also showed great performance during the loading phase with around 15K ops/sec. The data is first saved to the commit log, using the append method, which is a fast operation. Then it is written to a per-column family memory store called a Memtable. Once the Memtable becomes full, the data is saved to disk as an SSTable. In the "just in-memory" mode, MySQL Cluster could show much better results, by the way.
* Workload A: Update-heavily mode. Workload A is an update-heavily scenario that simulates the database work, during which typical actions of an e-commerce solution user are recorded. Settings for the workload: 1) Read/update ratio: 50/50 2) Zipfian request distribution
During updates, HBase and Cassandra went far ahead from the main group with the average response latency time not exceeding two milliseconds. HBase was even faster. HBase client was configured with AutoFlush turned off. The updates aggregated in the client buffer and pending writes flushed asynchronously, as soon as the buffer became full. To accelerate updates processing on the server, the deferred log flush was enabled and WAL edits were kept in memory during the flush period.
Cassandra wrote the mutation to the commit log for the transaction purposes and then an in-memory Memtable was updated. This is a slower, but safer scenario if compared to the HBase deferred log flushing.
* Workload A: Read. During reads, per-column family compression provides HBase and Cassandra with faster data access. HBase was configured with native LZO and Cassandra with Google's Snappy compression codecs. Although the computation ran longer, the compression reduces the number of bytes read from the disk.
* Workload B: Read-heavy mode. Workload B consisted of 95% of reads and 5% of writes. Content tagging can serve as an example task matching this workload; adding a tag is an update, but most operations imply reading tags. Settings for the "read-mostly" workload: 1) Read/update ratio: 95/5 2) Zipfian request distribution