How smart crowds are solving big data problems

Businesses and government agencies are using data science crowdsourced sites, such as Kaggle, to solve real problems.

Kaggle has a separate service called Connect. Eight current customers, according to Goodbloom pay anywhere from $30,000 to $100,000 a month. It combines the world’s top data scientists with tools developed to provide corporate customers with the best analytic solution possible. Participants in each project sign non-disclosures and work within a private area on virtual machines that aren't available to anyone else, and can't move the data to any outside destinations. You can come with a specific business problem to solve or with an unexplored data set to extract actionable insights. Goldbloom says, "The same people keep performing well irrespective of the problem. It's this fact that allows us to do private gigs, because we can reliably identify who will make the best fit." So who are some of the Kaggle rock stars? They are from all over the world. I interviewed two that coincidentally have day jobs in the financial services industry. In second place overall is Jason Tigg, a British physicist who looks at trading statistical arbitrage. He has entered 14 competitions, and won a few of them. He is motivated not by the prize purses but by learning new machine learning techniques. "I feel a buzz around the area, which I imagine was how physics felt around the turn of the last century. People are trying out new ideas and no one knows for sure where we will all end up." He got his start with the first Netflix prize and was hooked. (Netflix held their own data science contest to improve their own algorithms to recommend movies to their members.) He told me via email that "there is a lot of trial and error during the process" to refine his entries. Another top finisher (and currently third overall) is Olexandr Topchylo who works in developing trading strategies for financial markets from the Ukraine, and who also has physics and mathematics doctorates. "Contests are an ideal way to compare quality of your algorithms and your abilities against other analysts," he emailed me. He has participated in nine different contests, including one of the Facebook recruiting competitions where he came in seventh but hadn't yet had an opportunity to interview with them. "Every year I take part in the Automated Trading Championship. As opposed to Kaggle contests, here a participant has not only to devise an algorithm for predicting some values but he has also to develop a program which will be working online for three months on the organizers' server without human intervention. The code has to make virtual trades on real exchange rates. I took part in all five championships, and one time I even managed to win!" Besides Kaggle and the trading contest, there are plenty of other places to start your machine learning competitions, including India-based, for the life sciences and mainly for education and research projects.

How to host a successful Kaggle contest

If you are interested in hosting a competition, you must have a data set that you can scrub personal information to use for the competition and a budget for your prize purse. You fill out an entry form on the Kaggle website and their sales staff will consult with you to put the competition online. If a winner is declared (and there is usually a winner), the company pays the prize purse. Companies who have hosted competitions share these tips for a successful contest: Try to be as inclusive as possible. NASA and Boehringer have hosted contests, and kept the jargon and specific domain references to a minimum to encourage entrants from fields other than cosmology and biotech. Prepare your data into both a training set (that will be used to prove the initial models built by the contestants), and a contest set (that isn't available to the contestants, but is used to score the winners). Consider non-monetary incentives. Most of the Kagglers aren't doing this for the dough, but want the satisfaction of a job well done, or a chance to meet with your staff, or some other reward. For the NASA contest, the winners were invited to their labs to meet with their scientists. Finally, use your own social media and email contacts to publicize the contest to assure the widest possible field.

