March 21, 2011, 11:45 AM — Corporate efforts to glean business intelligence from the massive volumes of data generated by Web server logs and social media have led to a surge of interest in open-source Hadoop software.
Hadoop is designed to process terabytes and even petabytes of unstructured and structured data. It breaks large workloads into smaller data blocks that are distributed across a cluster of commodity hardware for faster processing.
The technology -- already used by Web giants such as Facebook, eBay, Amazon and Yahoo -- is increasingly being adopted by banking, advertising, biotech and pharmaceutical companies, said Stephen O'Grady, an analyst at RedMonk.
Tynt Multimedia, a Web analytics firm that collects and analyzes nearly 1TB of data per day, switched to Hadoop about 18 months ago when its MySQL database system began collapsing under the sheer volume of data it was collecting, said Cameron Befus, Tynt's vice president of engineering.
Relational database systems are good at data retrieval and queries but don't accept new data quickly. "Hadoop reverses that. You can put data into Hadoop at ridiculously fast rates," Befus said. But Hadoop requires programming tools such as Pig or Hive to write SQL-like queries to retrieve the data.
This version of this story was originally published in Computerworld's print edition. It was adapted from an article that appeared earlier on Computerworld.com.
Read more about applications in Computerworld's Applications Topic Center.