Monsanto's Hadoop initiatives focus on using the platform to build enterprise data processing pipelines for analyzing and storing data generated from scientific instruments. Hochmuth says building these analysis pipelines in Hadoop will allow Monsanto to scale as new scientific instruments are adopted and, as a result, increase data volume. Traditional solutions, on the other hand, require IT personnel to rewrite and engineer the analysis pipelines to accommodate increases in data volume.
Hochmuth says Monsanto has tapped Cloudera as a source of Hadoop know-how. Cloudera will offer consulting services to get Monsanto's Hadoop projects up and running. Once Monsanto has a team using Hadoop, the next step will involve building up its in-house knowledge, Hochmuth notes. To that end, Cloudera will provide on-site training sessions for Hadoop administrators, as well as ongoing enterprise support, he adds.
Consulting, Development, Training Address Hadoop Skills Shortage
Vendors pursuing the Hadoop skills gap offer a mix of consulting, software development and training services. Key players here include Hadoop distributors and specialized IT services companies.
"Hadoop is only now moving from R&D domain to mainstream corporate arena. There are not very many professionals out there in the market," notes Timothy Diep, business development manager at DCKAP, a technology consulting company that provides Hadoop development and consulting. "There is a premium on people who know enough about the guts of Hadoop."
Diep says customers ask for three main skill sets- data analysts/ data scientists, data engineers and data management professionals.
Analysts should have experience in SAS, SPSS and programming languages such as R. "These are the professionals who will generate, analyze, share and integrate intelligence gathered and stored in Hadoop environments," he says.
How-To: Cascading, Open Source Java Framework, Can Ease Big Data Hiring Pain
Data engineers, meanwhile, are responsible for creating the data processing jobs and building the distributed MapReduce algorithms that the data analysts use, Diep explains. Finally, data management personnel do three things, he says-make the call on whether to deploy Hadoop either in the cloud or using on-premises and selected vendors and distributions; determine the size of the cluster, and decide whether the cluster will be used for running production applications or for quality-testing purposes.