Event shows the many faces, challenges of big data

New applications emerge but changing skills demands and still-growing technology stand in the way

By , IDG News Service |  Software

But Hadoop in its current form has serious limitations, said Michael Stonebraker, a Massachusetts Institute of Technology professor and founder of a number of database vendors. He was also the primary architect for the Ingres and Postgres database systems and is currently CTO of VoltDB.

For one, it "has terrible performance on data management," he said. In addition, Hadoop is a low-level interface that requires people to program in Java, Stonebraker said. "Forty years of research says high-level languages are good."

The problems Stonebraker cited could be mitigated over time, however, given that an array of vendors have been rolling out various tools meant to make Hadoop easier to use.

Meanwhile, EMC's Greenplum division is "building a platform for the future of big data," said George Radford, field CTO, during a panel discussion. That includes both row-based and columnar stores, integrated Hadoop storage, and integration with the Gemfire in-memory data grid for in-memory analytics, he said. This integration is crucial, according to Radford. "One of the problems with point solutions is with big data, the last thing you want to do it move it. You want to ingest it and analyze it in place."

But a new problem for big data is emerging even as companies like EMC Greenplum make these technological strides, Radford added. "Like everyone else here, we're looking for data scientists. As we solve the platform issues, people are going to be transformed from bit-tweakers and tuners to active partners with the business."

At another point, talk turned to big data's relationship with cloud computing, particularly public infrastructure offerings like Amazon Web Services, which offer raw compute power for developers.

Such systems present "an extremely challenging environment" for big data processing given the limited control users ultimately have over factors like the underlying network and storage, said Fritz Knabe, distinguished engineer at IBM's Netezza division.

But the public cloud does make sense for large processing jobs in some cases, Stonebraker said. "If you are doing month-end reporting and you need 1,000 processors for three hours, go ahead and do that on the [public] cloud. There's some low-hanging fruit."

Chris Kanaracus covers enterprise software and general technology breaking news for The IDG News Service. Chris's e-mail address is Chris_Kanaracus@idg.com

Join us:






Ask a Question