While the majority of executives (58 percent) believe finding the right technology is the biggest challenge their companies face in analyzing data, the majority (56 percent) of IT decision-makers charged with implementing Big Data programs believe finding the right staff is a bigger challenge than finding the right technology. And it should come as no surprise that 63 percent of stakeholders believe their company needs to develop new skills to turn data into business insights, especially math and statistics (17 percent), business operations and analysis (37 percent) and visual design and reporting (22 percent).
Developers with Apache MapReduce Skills Are in High Demand
One of the major challenges companies face when they set out to transform the data they store into actionable insight is finding developers able to create MapReduce jobs to query Hadoop-stored datasets. MapReduce is a complicated and difficult framework to use.
"Folks that know MapReduce are a tough find and they're in high demand," says Brandon Mason, CTO of Upstream Software, a specialist in integrated marketing performance management. Upstream analyzes all the marketing data a retailer has-including Coremetrics or Omniture logs, keywords shoppers use, direct mail logs, email logs and so forth-to help retailers properly weight their marketing mix. "To do the secret sauce stuff, we really needed a platform to handle lots of different data sets. Sometimes it's very dirty."
Open Source Cascading Is Alternative to MapReduce
Enter Cascading, a stand-alone open source Java application framework designed as an alternative API to MapReduce. Cascading gives Java developers the ability to build Big Data applications on Hadoop using their existing skillset.
"I created Cascading in anger after having used MapReduce once in my life and vowing never to use it again," explains Chris Wensel, creator of Cascading.
Wensel authored Cascading as an open source project in 2007 and is now CEO of Concurrent, an enterprise Big Data application platform company that continues to drive development of Cascading as its primary commercial sponsor. Concurrent numbers companies like Twitter and Etsy, as well as Upstream, among its clients. Twitter has three internal teams that use Cascading to perform sophisticated statistical functions to analyze huge volumes of data from tweet contents, ad campaigns and user activity. Etsy executes more than 65 Cascading applications daily to extract data from its web logs and databases to monitor and understand user behavior, A/B site testing and power new features on its ecommerce site.