TPC takes the measure of big data systems

The fresh TPCx-HS benchmark could provide an apples-to-apples comparison of commercial Hadoop systems

By , IDG News Service |  Storage

Comparing commercial Hadoop big data-styled analysis systems might get a little easier, thanks to a new benchmark from the Transaction Processing Performance Council (TPC).

The TPCx-HS benchmark, posted Monday, offers a performance assessment of Hadoop-based systems.

"There has been a lot of push from our customers for a standard to objectively measure performance and price performance of big data systems," said Raghunath Nambiar, who is the chairman of the TPCx-HS committee, as well as a distinguished engineer at Cisco.

The worldwide IT market for big data-styled analysis should swell to over US$240 billion by 2016, according to IDC, and companies such as IBM and Hewlett-Packard are offering prepackaged systems running Hadoop, currently the most popular of the big data systems now being tested and used within the enterprise.

Today vendors may offer performance metrics of their Hadoop systems, though each company uses its own benchmark, making it difficult for customers to compare systems.

TPC hopes that Hadoop system vendors will run its benchmark against their own systems, allowing potential customers to directly compare the price performance across different offerings.

TPCx-HS "defines a level playing field. The number you get from vendor X can be fairly compared to the number from vendor Y," Nambiar said.

A benchmark kit, which can be downloaded from the TPC site, tests overall performance of a Hadoop system. It includes the specification and user documentation, as well as scripts to run the benchmark code and a Java program to execute the benchmark load.

The benchmark itself measures how quickly an Apache Hadoop system organizes data using the widely used terasort sorting algorithm. Vendors can tune their systems either by optimizing the software through various means, or by running the fastest hardware available.

Using the benchmark, a tester can choose one of a number of different-sized machine-generated data sets, ranging now from a single terabyte to 10,000 terabytes.

The benchmark provides a score for overall performance, as well as a price-performance score to specify how much performance the system offers per the cost of the system. A third optional test measures the energy efficiency of the system.

The test must be conducted twice, according to the TPC rules, and the slowest run of the two is the official benchmark speed. Published TPC results can be challenged by other parties within 60 days.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Spotlight on ...
Online Training

    Upgrade your skills and earn higher pay

    Readers to share their best tips for maximizing training dollars and getting the most out self-directed learning. Here’s what they said.

     

    Learn more

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Ask a Question