October 13, 2010, 1:30 PM — Taking aim at the growing problem of big data management, EMC has released a data warehouse appliance tweaked to consume lots of data really quickly.
The Greenplum Data Computing Appliance takes advantage of an MPP (massively parallel processing) technology developed by Greenplum, a firm acquired by EMC in July.
EMC claims the appliance can ingest data twice as quickly as competing products, which EMC identified as Oracle Exadata, IBM Netezza and Teradata's enterprise data warehouse offering. A single rack can ingest 10 terabytes per hour, the company claims.
The appliance will be marketed to organizations trying to derive intelligence of large amounts of incoming data, said Scott Yara, who was the president of Greenplum, and is now vice president of products at EMC.
"Machines sitting on the network or on the Web are generating much more data than humans ever could. All the mobile phones, sensor networks and routers are pouring off millions of events each day," he said. In order to make sense of this input, "businesses are forced to create all this data analysis infrastructure that they never had to before."
To ingest data more quickly, Greenplum adopted a parallel processing architecture long used by the high performance computing community.
Most data warehouse appliances have a single master node through which all data must enter, Yura explained. This approach can be a bottleneck when trying to import large amounts of data quickly. In the MPP approach, each server on a rack gets a dedicated Ethernet connection.
"Instead of loading the data into one system and trying to distribute it, [the Greenplum architecture] loads the data in parallel to all the servers in a cluster," Yara said. In a peer-to-peer fashion, the servers coordinate amongst themselves to balance the data across all the nodes.
The MPP architecture also allows the data analysis to be executed in parallel across the servers. "You can break a single query up across the all the machines," Yura said.
The Greenplum Data Computing Appliance, available now, offers database software (Greenplum Database 4.0) preloaded on an integrated set of servers, along with storage and networking. A single rack would have 16 servers, each running two Intel E5670 hexacore processors. The appliance can be purchased as a half-rack, a single rack, or in a multiple rack configuration. Each rack could hold up to 36 terabytes of uncompressed storage, or up to 5 petabytes compressed across 24 racks. A 24-rack system could run a total of 4,608 database cores.