April 15, 2014, 11:32 AM — It's hard to resist the sparkly nirvana that big data, leveraged appropriately, promises to those who choose to embrace it. You can transform your business, become more relevant to your customers, increase your profits and target efficiencies in your market all by simply taking a look at the data you probably already have in your possession but have been ignoring due to a lack of qualified talent to glean value from it.
Enter the data scientist - arguably one of the hottest jobs on the market. The perfect candidate is a numbers whiz and savant at office politics who plays statistical computing languages like a skilled pianist. But it can be hard to translate that ideal into an actionable job description and screening criteria.
This article explains several virtues to look for when identifying suitable candidates for an open data scientist position on your team. It also notes some market dynamics when it comes to establishing compensation packages for data scientists.
[ More: 5 Tips to Find and Hire Data Scientists ]
Because "data scientist" represents a bit of a new concept, without a lot of proven job descriptions, you'll want to work closely with your human resources department on the rubric and qualifications you use to screen initial resumes and also set up a first round of interviews. What follows are five salient points that should prove useful as qualify candidates for a data scientist role.
1. A Good Data Scientist Understands Statistics and Laws of Large Numbers
Trends are seen in numbers. For example, a good data scientist understands, "This many customers behave in this certain way" or "This many customers intersect with others at this many precise points." Over large quantities of data, trends pop out in numbers.
A great data scientist has the skillset to understand trends in large numbers and an ability to translate that into predictive analytics. A good data scientist can interrogate large quantities of data and extract trends, then use predictive modeling techniques to anticipate behavior across that aggregate dataset. Statistics are also helpful in preparing reports for management and prescribing recommended courses of action.
[ Resources: Who's Training the Next Generation of Data Scientists?U.S. Colleges, With Help from IBM ]
While a mathematics degree would be ideal, many qualified candidates have taken a slightly more practical academic path. Don't be scared away by interviewees who lack advanced mathematics credentials. A focus on statistics in a candidate's academic career, whether at the bachelor level or above, would prove sufficient for this type of position.
2. A Good Data Scientist Is Inquisitive
Part of the allure and mystique of big data is the art of teasing actionable conclusions from a giant haystack of (typically) unstructured data. It's generally not enough to know how to write queries to find specific information without being able to generate the context of what queries should be run, what data we would like to know and what data we might not know we would like to know but that could possibly be of interest.
Yes, great data scientists execute queries and database runs, but they also design suggestions for architecting queries in ways that not only return a defined set of results to answer a question someone already asked, but that also reveal new insights into questions that have not yet been asked by an organization. This is where the real value of a data scientist will present itself over the coming years.
While some might argue that this is a soft skill that's difficult to interview for, carefully crafted hypothetical scenarios presented to candidates during interviews can help you understand their thought process, their approach to a problem, the various ways the candidate would attempt to glean the answers to the problem and what other questions the candidate could pose that would add value to the original query. Stress to candidates during the interviews that outside-the-box thinking is encouraged, while limiting answers to only the problems posed is discouraged.
3. A Good Data Scientist Is Familiar With Database Design and Implementation
It's important for today's data scientists to sit somewhere between an inquisitive university research scientist (which is essentially what the previous point describes) and a software developer or engineer: Someone who knows how to tune his lab and operate his machinery well.
Even though much of what falls under the "big data" category is known as unstructured data, a fundamental understanding of both relational and columnar databases can really serve a data scientist well. Many corporate data warehouses are of the traditional row-based relational database sort. While big data is new and alluring, much actionable data and trends can be teased from traditional databases.
Data scientists will also play a key role in setting up analytics and production databases to take advantage of new techniques. A history of working with databases would provide great context for setting up new systems in the new role.
Additionally, many big data software developers attempt to use SQL-like language in their products in an attempt to woo traditional database administrators who have no desire to learn a MapReduce-like language. Knowledge of traditional SQL will continue to pay dividends, allowing data scientists to play nicely and integrate well with other database professionals that you already have on staff.
4. A Good Data Scientist Has Baseline Proficiency in a Scripting Language
Your most qualified candidates should be awarded extra points for knowing Python at least somewhat well. Many query jobs over vast quantities of unstructured data are issued in scripts and take quite some time to run.
Python is generally accepted as the most compatible, most versatile scripting language for working with columnar databases, MapReduce-style queries and other elements of the data scientist puzzle. Python is an open source language known to be fairly usable and easy to read, so it shouldn't pose much of a hurdle for your base of data scientist candidates to overcome.
[ Analysis: How Many Data Scientists Does the World Really Need? ]
You could also consider "pseudo code" skills, or the ability to write almost in plain English how an algorithm or a query would work. Such a test would show the quality of the thinking and the approach to a problem, as well as how such a problem would begin to be solved by your applicant, regardless of if he or she actually possesses the skills in any given language to pull it off.
Be Prepared to Show Data Scientists the Money
As demand for data scientists increases, and as long as the supply of qualified candidates is being outstripped by it, salaries are rising. In almost any metro market in the United States, data scientists are receiving six-figure base salaries - obviously higher in high cost markets such as the West Coast. In Silicon Valley, in particular, multiple offers for a qualified candidate are not uncommon.
Don't attempt to pay below market rates for this position. Even startups are paying data scientists comfortable wages and giving them the ability to work on challenging new products, unlike their traditional modus operandi of loading up in equity positions and paying measly wages. Put simply: Don't cheap out and expect great talent.
Jonathan Hassell runs 82 Ventures, a consulting firm based out of Charlotte. He's also an editor with Apress Media LLC. Reach him via email and on Twitter. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.
Read more about big data in CIO's Big Data Drilldown.