Speech recognition through the decades: How we ended up with Siri

How did such sophisticated speech recognition technology come to be? It started back in the 1950s.

By Melanie Pinola, PC World |  Unified Communications, Apple, Siri

In 1990, Dragon launched the first consumer speech recognition product, Dragon Dictate, for an incredible price of $9000. Seven years later, the much-improved Dragon NaturallySpeaking arrived. The application recognized continuous speech, so you could speak, well, naturally, at about 100 words per minute. However, you had to train the program for 45 minutes, and it was still expensive at $695.

The advent of the first voice portal, VAL from BellSouth, was in 1996; VAL was a dial-in interactive voice recognition system that was supposed to give you information based on what you said on the phone. VAL paved the way for all the inaccurate voice-activated menus that would plague callers for the next 15 years and beyond.

2000s: Speech Recognition Plateaus--Until Google Comes Along

By 2001, computer speech recognition had topped out at 80% accuracy, and, near the end of the decade, the technology's progress seemed to be stalled. Recognition systems did well when the language universe was limited--but they were still "guessing," with the assistance of statistical models, among similar-sounding words, and the known language universe continued to grow as the Internet grew.

Did you know speech recognition and voice commands were built into Windows Vista and Mac OS X? Many computer users weren't aware that those features existed. Windows Speech Recognition and OS X's voice commands were interesting, but not as accurate or as easy to use as a plain old keyboard and mouse.

Speech recognition technology development began to edge back into the forefront with one major event: the arrival of the Google Voice Search app for the iPhone. The impact of Google's app is significant for two reasons. First, cell phones and other mobile devices are ideal vehicles for speech recognition, as the desire to replace their tiny on-screen keyboards serves as an incentive to develop better, alternative input methods. Second, Google had the ability to offload the processing for its app to its cloud data centers, harnessing all that computing power to perform the large-scale data analysis necessary to make matches between the user's words and the enormous number of human-speech examples it gathered.

In short, the bottleneck with speech recognition has always been the availability of data, and the ability to process it efficiently. Google's app adds, to its analysis, the data from billions of search queries, to better predict what you're probably saying.


Originally published on PC World |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Unified CommunicationsWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question