How to explain to big data newbies why correlation doesn't equal causation

People new to Big Data find it easy to assume that because two or more data sets appear to be related, they are.

With the explosion of interest in Big Data everyone in every department is looking for actionable intelligence. That's great but there's a downside: Trying to explain to, say, your VP of sales that the sales of barbecue sauce might appear to be connected to the selling price of beef but you can't say that's true for certain and that it would be inadvisable to act on that conclusion without deeper analysis.

"What?!" she'll say. "I can see with my own eyes that they curvey things go up and down together." "Ah" you can reply, "let me show you something ..." so you show her the Spurious Correlations web site.

This site is a treasury of examples that demonstrate, very clearly, that correlation does not prove causation. For example, the correlation between US spending on science, space, and technology and suicides by hanging, strangulation and suffocation is a remarkable 99.2% yet no one in their right mind would says that one causes the other.

Similarly, the per capita consumption of cheese in the US correlates 94.7% with the number of people who died by becoming tangled in their bedsheets and is just as easily rejected as not causative even though there's a very high degree of correlation.

Published by Tyler Vigen the Spurious Correlations site currently contains 27,724 correlations many of which are very amusing (for example, the marriage rate in New York has an 87.9% correlation with murders by blunt objects) and Tyler's mini-lecture on correlation and causation is worth putting in front of the unwashed to get 'em up to speed.

Follow me on, and Facebook.

This story, "How to explain to big data newbies why correlation doesn't equal causation" was originally published by Network World.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon