April 18, 2013, 7:52 PM — There's little doubt that data-derived insight will be a key differentiator in business success, and even less doubt that those who produce such insight are going to be in very high demand. Harvard Business Review called "data scientist" the "sexiest" job of the 21st century, and McKinsey predicts a shortfall of about 140,000 by 2018. Yet most companies are still clueless as to how they're going to meet this shortfall.
Image credit: Thinkstock
Unfortunately, the job description for a data scientist has become quite lofty. Unless your company is Google-level cool, you're going to struggle to hire your big data dream team (well, at least right now), and few firms out there could recruit them for you. Ultimately, most organizations will need to enlist the support of existing staff to achieve their data-driven goals, and train them to become data scientists. To accomplish this, you must determine the basic elements of data scientist "DNA" and strategically splice it into the right people.
Analyzing data scientist DNA
Part scientist, technologist, industry expert and artist, data scientists figure out where and how to get the data, determine what it means, apply the answers creatively, and convince the organization that they're right. Given the conventional wisdom that data science requires a high level of technical skill, the idea of injecting this DNA into your own employees may seem implausible. How do you re-educate your current workforce along such an overwhelmingly complex paradigm?
How to splice data scientist “DNA” into existing team
I recently spoke with renowned data scientist DJ Patil, who pointed out that a “data scientist” is “one part hacker, one part analyst, and a whole lot of curiosity.” To turn a business analyst into a data scientist, the trick is to connect with their natural curiosity to motivate them to hone hacking skills:
Tip 1:Ask if they want to learn how to hack. If not, then pair them with a hacker.
Tip 2:If they want to learn how to hack, programming classes are next. The chief goal is to gain insight quickly, therefore I recommend starting with NoSQL, Pig, Cascading, HiveQL, and Wukong.
Tip 3:Give them “Big data curiosity tools” -- user-friendly platforms for deriving data-driven insight -- that they can apply to investigating a specific thesis.
Tip 4:Focus on answering a single business question that is perceived to be important to the company.
It’s important to start with a small, achievable, yet impactful goal. People are used to working with tools that yield canned results -- once they get a taste for how hacking with Big Data opens up a new world of insight, they’ll be hooked! You won’t have to coax them into sharpening their skills, they’ll be begging you for more training.
Splicing the genes
Let's think about it from a gene-splicing perspective. If you want to create a creature that can breathe underwater and fly, would it be more feasible to insert the genes for gills into a seagull, or splice the genes for wings into a herring? While I'll defer to geneticists for a specific answer, the general guideline is that you choose the option requiring the easiest modification.
In our case, we need to look at the data scientists' traits -- hard science expertise, computer programming skills, business acumen and creativity -- and determine into which the “genes” of the others can be most easily inserted.
Flying fish and data-savvy business analysts
Your first impulse may be to say that you'll never train someone who isn't really a computing expert on data herding, because it is much too complex. That may be true, but consider that since the advent of the PC, the trend in technology is toward user-friendliness. A 10-year-old kid with an iPhone can perform feats that would have taken a team of the world's top computer scientists to accomplish a few decades ago. This is being amplified in business technology through the cloud, which allows companies access to practically unlimited computational power.
In light of the fact that every aspect of computing is on the cusp of becoming infinitely more accessible to the non-technical, does it really make sense to place the focus on computing skills for a data scientist? Or does it make more sense to outsource every possible back-end data function and arm those with business acumen, scientific or industry expertise and other non-sourceable talents with simplified data analysis tools? Let's examine our options.
Inserting business DNA into IT staff? A matter of shortcuts and interests
Before you embark on this route, the first thing to ask is, do you have IT staffers who want to spend most of their time thinking about the business, rather than the tools that support the business?
The answer might be yes. But, I'm going to go out on a limb and suggest that most of your IT staff entered the profession because they're interested in the technology that supports the business a lot more than they are in the business itself.
Also, consider it from this vantage point -- thanks to the miracle of user-friendliness, extremely complex technologies can be operated by people who are practically tech-illiterate. The reverse is not true. The principals of business -- marketing, economics, and understanding of the ins-and-outs of a specific industry -- cannot be manipulated by someone who doesn't thoroughly understand them. To put it another way, there are shortcuts on the technology side that don't exist on the business side.
Inserting technical DNA into business staff
Once again, it depends on your goals. But, as organizations begin this introspective journey, they may find that what they actually need are business managers who understand data, rather than scientists, per se. The fact is that data analysis tools are becoming much more user friendly, and the cloud is rapidly taking us to the point where the data herding part of the job can be cost-effectively outsourced.
Rather than trying to bring technologists up to speed on aspects of business that they may have little interest in, it might make more sense to train those who have business acumen or analytical abilities on tools that will enable them to gain greater insight into what they're already interested in.
About the author:
Serial entrepreneur Jim Kaskade, CEO of Infochimps, the company that is bringing Big Data to the cloud, has been leading startups from their founding to acquisition for more than ten years of his 25 years in technology. Prior to Infochimps, Jim was an Entrepreneur-in-Residence at PARC, a Xerox company, where he established PARC’s Big Data program, and helped build its Private Cloud platform. Jim also served as the SVP, General Manager and Chief of Cloud at SIOS Technology, where he led global cloud strategy. Jim started his analytics and data-warehousing career working at Teradata for 10 years, where he initiated the company’s in-database analytics and data mining programs.