What is the central skill set that you should look for in a data scientist? Obviously you need a mathematician, or at least someone with an advanced mathematics background, but what are the attributes of a recruit who will have the overall set of personal assets to make an effective big data analyst? Is it easier and more effective to just hire a recruiting firm, or try to fill your data scientist needs with an in-house effort? This is the first time I've had any input in the hiring of a data scientist, and I'm a little surprised at how much competition there is for mathematicians at the moment.

Hi ncharles,

Here's a link to the Data Scientist category on Simply Hired. You can find many job descriptions and requirements listed here for real Data Scientist jobs.

I think that's a great place to start since you can see what actual companies are looking for right now.

I knew that big data was here in a big way when I heard a two day piece on NPR about it.  It is like scouting professional athletes to identify and recruit the creme of the data scientist crop.  The best data scientists are not necessarily "just" a mathematician.  To quote DJ Patil from Greylock Partners, you are looking for a reare breed, "someone with a brian for math, finesse with computers, the eyes of an artist and more."  Companies are opening branches in the college towns of universities with outstanding math programs just to recruit the best candidates.  Cataphora opened an office in Ann Arbor to recruit graduates from Michigan's math department, for example.  There is a lot of competition out there for the most talented prospects, so there is one thing I think you can count on - you will have to offer a nice fat paycheck to your recruit, whether you find him yourself or hire a recruiting firm to do it.     

We've found that there are really two different kinds of Data Scientists... the ones that are using machine learning algorithms and packages like Weka... often a Data Engineer with distributed computing, hadoop, strong scripting, as well as some stats background, and the other ones who are PhD level statisicians trying to achieve even greater lift by implementing better statistical formulas in highly competitive computational environments where you have miliseconds to react in real-time.

