November 06, 2013, 5:02 PM — Do you have a data warehouse that stores more than 300 petabytes of data and struggle with the latency of queries? Well, few companies have data at Facebook's scale, but the performance of queries against your data warehouse is often still a serious productivity issue. Facebook has been developing a solution to that problem and today it offered that answer up to the open source community. It calls the answer Presto.
Today, Facebook released Presto to the open source community under the Apache 2.0 license.
"Facebook's warehouse data is stored in a few large Hadoop/HDFS-based clusters," writes Martin Traverso, software engineer at Facebook, in a blog post Wednesday.
"Hadoop MapReduce and Hive are designed for large-scale, reliable computation, and are optimized for overall system throughput. But as our warehouse grew to petabyte scale and our needs evolved, it became clear that we needed an interactive system optimized for low query latency."
The Magic of Presto
Presto supports standard ANSI SQL, including complex queries, aggregations, joins and window functions. The engine was designed with a simple storage abstraction that, Traverso says, makes it easy to provide SQL query capability against HDFS, other well-known data stores like HBase and even custom systems like the Facebook News Feed backend. Storage plugins, which Facebook calls connectors, provide interfaces for fetching metadata, getting data locations and accessing the data itself.
"Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook," Traverso says.
"Presto is amazing," says Chris Gutierrez, data scientist at Airbnb, which is among the small number of external companies with which Facebook has already shared the Presto code and binaries. "A lead engineer got it into production in just a few days. It's an order of magnitude faster than Hive in most of our use cases. It reads directly from HDFS, so unlike Redshift, there isn't a lot of ETL before you can use it. It just works."
"We're really excited about Presto," adds Fred Wulff, a software engineer at Dropbox, which has also been testing the engine. "We're planning on using it to quickly gain insight about the different ways our users use Dropbox, as well as diagnosing problems they encounter along the way. In our tests so far it's been rock solid and extremely fast when applied to some of our most important ad-hoc use cases."
Facebook currently has more than one million active users that generate a never-ending stream of data. The company operates one of the largest data warehouses in the world-it stores more than 300 petabytes of data used for applications that range from traditional batch processing to graph analytics, machine learning and real-time interactive analytics.
Inside Apache Hive
Apache Hive, also initially developed at Facebook, has become the most common infrastructure for querying massive data warehouses on Hadoop clusters. But Presto's execution model is fundamentally different than Hive, Traverso says, making it better-suited to interactive queries.
"Hive translates queries into multiple stages of MapReduce tasks that execute one after another," Traverso says. "Each task reads inputs from disk and writes intermediate output back to disk. In contrast, the Presto engine does not use MapReduce. It employs a custom query and execution engine with operators designed to support SQL semantics. In addition to improving scheduling, all processing is in memory and pipelined across the network between stages. This avoids unnecessary I/O and associated latency overhead. The pipelined execution model runs multiple stages at once and streams data from one stage to the next as it becomes available. This significantly reduces end-to-end latency for many types of queries."
Facebook began working on Presto in fall 2012 and had a production system up and running early in 2013. By spring of this year, it had rolled Presto out to the entire company-more than 1,000 employees use it daily, running more than 30,000 queries processing one petabyte daily. Facebook has deployed the engine in multiple geographical regions and notes that it has successfully scaled a single cluster to 1,000 nodes.
The engine still has its challenges. Traverso says the main restrictions at this stage are a size limitation on join tables and cardinality of unique keys/groups. Additionally, it lacks the ability to write out data back to tables. But those issues are all front-and-center in Facebook's roadmap for Presto. Over the next few months, Facebook plans to remove restrictions on join and aggregation sizes and add the ability to write output tables.
"We are also working on a query "accelerator" by designing a new data format that is optimized for query processing and avoids unnecessary transformations,"Traverso says. "This feature will allow hot subsets of data to be cached from backend data store and the system will transparently use cached data to "accelerate" queries. We are also working on a high-performance HBase connector."
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.
Read more about business intelligence (bi) in CIO's Business Intelligence (BI) Drilldown.