Hadoop targets data sets too large and cumbersome to manage and analyze using conventional database technology. It does this by dispersing big data processing tasks across multiple computing nodes.
Hadoop software tends to be grouped alongside NoSQL databases as big data technology. Its core components of Hadoop consist of MapReduce, which distributes processing jobs on a Hadoop cluster, and the Hadoop Distributed File System. A number of other open-source projects, as well as some commercial software, round out the Hadoop ecosystem.
A company's journey into this ecosystem could well begin as an informal experiment. For example, Weber says a company may have an employee interested in Hadoop who downloads the software and builds a small cluster.
Doing something more ambitious with Hadoop will typically require additional resources. In Shutterfly's case, the organization started with in-house resources, but now works with an outside contractor and plans to bring on additional help. Shutterfly also aims to harness Hadoop for website analytics. The company hopes to glean greater insight into customer transactions and the website's overall technical performance.
While Shutterfly works with the contractor on a limited basis, the company is will be working with vendors like Hortonworks to start "an effort that is much more formalized," Weber says. Contractor and vendor resources will initially focus on getting the company's Hadoop project off the ground. Weber says he also aims to train a small group of in-house personnel beyond introductory Hadoop knowledge.
Case Study: How ComScore Is Using Hadoop to Tame Its Big Data Flow
Monsanto, an agricultural products company based in St. Louis, also finds itself cultivating internal resources and looking for outside support. The company's geographic location-away from the big IT centers on the East and West coasts-creates Hadoop recruiting and hiring issues. "Being&in the Midwest, that is a challenge for us," says Lori Yancey, R&D IT staffing lead at Monsanto.
The company has been evaluating Hadoop since late 2009. Last year, Monsanto decided to build out a full production cluster, notes Erich Hochmuth, R&D IT high performance analytics lead at Monsanto. He says the company has a couple Hadoop projects underway and uses the platform "for analytics over large unstructured and semi-structured datasets."