December 13, 2010, 3:15 PM — For years, using voice recognition technology on phones or other devices has been a novelty -- something people try once but never again, usually because it works so poorly. But recent developments, including harnessing the computational power of the cloud, have made it more usable and will make it even better in the near future, according to Microsoft.
Of all the services Microsoft hosts, speech recognition uses one of the largest cloud systems the company has, said Zig Serafin, general manager for speech at Microsoft. It includes the voice response systems used by the customer-service phone lines of large companies like Orbitz and American Airlines, as well as the technology that lets mobile Bing users search by voice and Ford Sync users ask for directions.
Microsoft got into the field when it acquired Tellme in 2007. Voice recognition had already been around for years but it didn't work very well.
"Even just standing in a quiet room back in the day trying to use some of the embedded software on a mobile phone was just painful," said Will Stofega, an analyst at IDC.
But the technology has improved enough that of all mobile searches handled by Microsoft, 20 percent now come in using voice, Microsoft said.
Microsoft uses the cloud to collect information about how people use the service as a way to improve it. For instance, if a user speaks "Italian restaurant Seattle" into Bing on their Windows Phone 7 device, Microsoft knows if the user then clicks on a result, presumably getting the answer they want. The user instead may speak a search query a few more times, indicating that Microsoft probably didn't get the translation right. Microsoft collects information also about phone connectivity, in case it is partly to blame for delivering poor results.
"That [data] becomes valuable to help improve the underlying science on the system," Serafin said.
Google, which also lets users search by voice and has other offerings that use voice recognition, similarly uses back-end processing to learn from the way that people use the services.
At Microsoft, because the same back-end system handles speech recognition in multiple products, the company is now processing about 11 billion speech requests in a year, it said. On its new Windows Phone 7 devices, users hold down the home button to launch the speech feature, which can be used to control many applications on the phones.
Microsoft sifts through that massive volume of data from a network operations center in Silicon Valley. "It's fascinating to see the number of requests coming in," Serafin said. "It's like walking into a mini version of NASA."
Some elements of the feedback loop are automated so that the speech recognition engine itself is capable of parsing the data, he said. Some data is examined closer by experts who might then make changes to the system.