November 05, 2011, 7:59 AM — Looking back on the development of speech recognition technology is like watching a child grow up, progressing from the baby-talk level of recognizing single syllables, to building a vocabulary of thousands of words, to answering questions with quick, witty replies, as Apple's supersmart virtual assistant Siri does.
Listening to Siri, with its slightly snarky sense of humor, made us wonder how far speech recognition has come over the years. Here's a look at the developments in past decades that have made it possible for people to control devices using only their voice.
1950s and 1960s: Baby Talk
The first speech recognition systems could understand only digits. (Given the complexity of human language, it makes sense that inventors and engineers first focused on numbers.) Bell Laboratories designed in 1952 the "Audrey" system, which recognized digits spoken by a single voice. Ten years later, IBM demonstrated at the 1962 World's Fair its "Shoebox" machine, which could understand 16 words spoken in English.
Labs in the United States, Japan, England, and the Soviet Union developed other hardware dedicated to recognizing spoken sounds, expanding speech recognition technology to support four vowels and nine consonants.
They may not sound like much, but these first efforts were an impressive start, especially when you consider how primitive computers themselves were at the time.
1970s: Speech Recognition Takes Off
Speech recognition technology made major strides in the 1970s, thanks to interest and funding from the U.S. Department of Defense. The DoD's DARPA Speech Understanding Research (SUR) program, from 1971 to 1976, was one of the largest of its kind in the history of speech recognition, and among other things it was responsible for Carnegie Mellon's "Harpy" speech-understanding system. Harpy could understand 1011 words, approximately the vocabulary of an average three-year-old.
Harpy was significant because it introduced a more efficient search approach, called beam search, to "prove the finite-state network of possible sentences," according to Readings in Speech Recognition by Alex Waibel and Kai-Fu Lee. (The story of speech recognition is very much tied to advances in search methodology and technology, as Google's entrance into speech recognition on mobile devices proved just a few years ago.)