Splice Machine taps Apache Spark for in-memory data muscle

Version 2.0 of the hybrid database offers up to 20x performance improvements over traditional systems at a quarter of the cost, the company says

splice machine booth oow cropped

Splice Machine's booth at the Oracle OpenWorld conference in San Francisco on Oct. 26, 2015.

Credit: Katherine Noyes

Today's vast data volumes have spawned a variety of new database contenders, each with particular strengths and features to recommend them. Splice Machine is one such upstart, and while it's always banked on the scale-out capabilities of Hadoop, on Tuesday it placed an accompanying bet on Apache Spark's in-memory technology.

Splice Machine 2.0, which is now in public beta, integrates the open-source Apache Spark engine into its existing Hadoop-based architecture, creating a flexible hybrid SQL database that lets businesses perform transactional and analytical workloads at the same time.

"Most in-memory systems require you to store all data in memory," said Monte Zweben, Splice Machine's CEO, in an interview last month.

Such technologies can become prohibitively expensive as data volumes grow. "We're doing just compute in memory -- you can store data elsewhere," he said.

Splice Machine 2.0 uses in-memory computation to bring forth analytical business-intelligence results faster but uses Hadoop's HBase database to durably store and access data at scale. Benefits include lower cost and higher speed, Zweben said.

"Our endeavor is to use in-memory to create an integrated hybrid technology," he said. "We'll have transactions hitting our database while simultaneously doing the BI without either impeding the other."

With separate processes and resource management for Hadoop and Spark, the Splice Machine RDBMS can ensure that large, complex analytical-processing queries do not overwhelm time-sensitive transactional ones. For example, users can set custom priority levels for analytical queries to ensure that important reports are not blocked behind a massive batch process that consumes all cluster resources.

The result is performance between 10 and 20 times better than what's offered by traditional relational database management systems, at as little as one-fourth the cost, the company said.

Splice Machine 2.0 is particularly well-suited for use in applications including digital marketing, operational data lakes, data warehouse offloads and the Internet of Things, it added. The bottom-line benefit, Zweben said, "is being able to make decisions in the moment."

ITWorld DealPost: The best in tech deals and discounts.