October 31, 2011, 4:54 PM — It's not as if it hasn't been done before, but researchers at Stanford quantified ways to crack the CAPTCHA test many web sites use to make users prove they're human, and recommended ways to make sure the test continues to work.
CAPTCHA is a utility invented by a group of Carnegie Mellon students in 2000 to foil automated form fillers by forcing users to type out text presented as images of words warped in ways that are difficult for spambots or other programs to read.
Though there are dozens or hundreds of variations using different types of images, text or performance requirements – one particularly vicious one requiring that users solve a mathematical formula before using a site – most sites use variations of the warped-text scheme as their test.
Using a variety of image- and text-recognition algorithms and analyzing the patterns in which various CAPTCHA schemes divide text into segments, Stanford security researchers attacked the CAPTCHA tests used by Wikipedia, Authorize, Baidu, Blizzard, CNN, Digg, eBay, Google, Reddit, Slashdot and others popular sites.
By identifying and coding consistencies among various schemes, they eventually developed an automated cracking tool called Decaptcha.
The tool (also available for download) successfully cracked 13 of the 15 most commonly used text-image schemes, though only 25 percent of the time, according to the paper. That link leads to a personal site of one of the researchers – Elie Bursztein – who posted links to both the paper describing the research and the tool itself.
Predictability is the enemy
The success rate may be low, but reliably being able to crack 13 of the 15 most common ways used to present CAPTCHA tests – not just 13 of the 15 most popular sites using CAPTCHA – pretty thoroughly skewers it as a reliable 'bot preventative.
An earlier paper from Bursztein and a slightly larger mix of other Stanford security wonks actually reinforced the reputation of CAPTCHA as a good 'bot preventative by showing even humans had trouble decoding the images, let alone 'bots.