December 23, 2011, 4:31 PM — A pretty well executed content-mining analysis of Wikipedia adds some decent evidence that the ability to spell common words correctly is deteriorating even among the best-educated, most literate groups of people in the world – Internet users committed enough to making sure information in public spaces is accurate that they're willing to do the hard intellectual work of adding their own mistakes to replace the ones they find in Wikipedia.
Jon Stacey, who describes himself as a "tech enthusiast and MBA student at the University of Nebraska-Lincoln," took randomish samples from Wikipedia by retrieving articles without a particular plan about what he wanted, then spell checked the content (remember that last step; it'll be important later).
The spell check itself was a problem because even Americans and Canadians, who speak almost the same language, spell some things very differently. Americans and the British, whose languages aren't even remotely similar, also have huge differences in the way they spell or pronounce words. "Schedule," for example is spelled the same way in both the U.S. and U.K, but is pronounced "sk-ed-yoool" in the United States and "schrrrruruuuullllll" in England. No one is certain what the British intend to communicate using that sound, but they put great importance by it, so it's almost certain to be offensive.
By contrast, Canadians (almost the same language as Americans, but spelled differently, remember?) pronounce most words almost correctly but spell very much like the British. Except the words "please" and "sorry." In Canada, "sorry" is pronounced with two extra O's and two additional R's; in England neither word is used at all.
So Stacey had to compile his own dictionary and set of spellings so he would have something consistent to compare the Wikipedia misspellings against.
By selecting his own, and ignoring the style and spelling guides in Wikipedia itself, he guaranteed a higher rate of error that he should really have had, but as long as the dictionary itself remained stable, the comparison should have been able to create a statistically valid trend line mapping changes in the quality of spelling over time.
His conclusion was "The test does not satisfactorily or definitively answer the hypothesis."
Which is B#%%&*!&^. Why read a whole mystery novel if the conclusion is going to be that the detective can't figure out who did it?
Stacey's test wasn't rigorous enough to give real, definitive, statistically valid evidence one way or the other.
The trend – though unproven – was pretty clearly that the quality of spelling was deteriorating over time.