A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak

By Sergey Bushik, senior R&D engineer at Altoros Systems Inc., Network World |  Software, Cassandra, databases

Database performance was defined by the speed at which a database computed basic operations. A basic operation is an action performed by the workload executor, which drives multiple client threads. Each thread executes a sequential series of operations by making calls to the database interface layer both to load the database (the load phase) and to execute the workload (the transaction phase). The threads throttle the rate at which they generate requests, so that we may directly control the offered load against the database. In addition, the threads measure the latency and achieved throughput of their operations and report these measurements to the statistics module.

The performance of the system was evaluated under different workloads:

Workload A: Update heavily Workload B: Read mostly Workload C: Read only Workload D: Read latest Workload E: Scan short ranges Workload F: Read-modify-write Workload G: Write heavily

Each workload was defined by:

1) The number of records manipulated (read or written) 2) The number of columns per each record 3) The total size of a record or the size of each column 4) The number of threads used to load the system

This research also specifies configuration settings for each type of the workloads. We used the following default settings:

1) 100,000,000 records manipulated 2) The total size of a record equal to 1Kb 3) 10 fields of 100 bytes each per record 4) Multithreaded communications with the system (100 threads)

Testing environment

To provide verifiable results, benchmarking was performed on Amazon Elastic Compute Cloud instances. Yahoo Cloud Serving Benchmark Client was deployed on one Amazon Large Instance:

" 7.5GB of memory " four EC2 Compute Units (two virtual cores with two EC2 Compute Units each) " 850GB of instance storage " 64-bit platform " high I/O performance " EBS-Optimized (500Mbps) " API name: m1.large

Each of the NoSQL databases was deployed on a four-node cluster in the same geographical region on Amazon Extra Large Instances:

" 15GB of memory " eight EC2 Compute Units (four virtual cores with two EC2 Compute Units each) " 1690GB of instance storage " 64-bit platform " high I/O performance " EBS-Optimized (1000Mbps) " API name: m1.xlarge

Amazon is often blamed for its high I/O wait time and comparatively slow EBS performance. To mitigate these drawbacks, EBS disks had been assembled in a RAID0 array with stripping and after that they were able to provide up to two times higher performance.

The results


Originally published on Network World |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Ask a Question
randomness