Microsoft Corp. is about to stir the speech recognition market with the launch of its Speech Server products next week. The vendor promises speech recognition for the masses, but analysts warn that speech-enabling applications is not easy.
Microsoft Chairman and Chief Software Architect Bill Gates is scheduled to formally launch Speech Server 2004 Standard Edition and Enterprise Edition at the SpeechTEK conference in San Francisco next week. The launch marks the Redmond, Washington-based company's entry into the server-based speech recognition market where it will compete with vendors including Nuance Communications Inc., ScanSoft Inc. and IBM Corp.
"Our goal is to make speech recognition technologies mainstream," said James Mastan, director of marketing for the Microsoft's Speech Server group. Microsoft's way to do that is by making speech recognition available at lower cost and easier to deploy, manage, develop and maintain than competing products, he said.
The pitch is simple. Developers can add speech capabilities to existing Web applications based on Microsoft's ASP application framework by adding code based on XML (Extensible Markup Language) and SALT (Speech Application Language Tags) technologies using Visual Studio .Net. Speech Server takes calls and communicates with the Web server through XML and SALT and makes applications offered online available through the phone, Mastan said.
Speech Server runs on Windows Server 2003. The Enterprise Edition needs to run on a separate physical server while Standard Edition, designed for small and medium-sized installations, can be placed on the same hardware as the Web server. Microsoft will recommend configurations and resellers will offer fully configured systems, Mastan said.
Users will like Speech Server because it is familiar, Mastan said. Developers can use Visual Studio and it runs just like any other Microsoft server product. "It is not some black box in a call center that you have to program for in some weird language and you can't maintain yourself because you don't know how it works," he said.
Microsoft's entry will stir the speech recognition market, according to Yankee Group and Gartner Inc. analysts. However, Microsoft has to prove itself in the market and users need to be aware that creating a speech recognition system is more complex than Microsoft makes it sound in its marketing messages, they said.
"Speech applications and a voice user interface are pretty tricky to do. That may well get lost in the first version of the Microsoft marketing hype that will go out there," said Steve Cramoysan, a principal analyst at Gartner. "If you're going to use Microsoft Speech Server, use professional services people who know exactly what they're doing."
Yankee Group Senior Analyst Art Schoeller in a research note last year issued the same warning to potential Speech Server users. "It is dangerous to imply that any Web developer will speech-enable applications, because not all have proper training in the best practices for dialog design," he wrote.
Still, Microsoft's entry into the speech recognition market is a significant event, Cramoysan said. "Microsoft will certainly shake up this market, but I think we're going to be looking at the second and third version of this product when they will become much more competitive than with this first release of the product," he said.
Nuance, fingered by Mastan as Microsoft's chief rival, agrees with the analysts and goes a step further. "Microsoft is developing an inexpensive and easy way for developers to design really bad applications," said Kevin Chatow, principal product manager at Nuance in Menlo Park, California. Adding speech to Web applications may not result in usable applications, he said.
While Microsoft may like to position Nuance's product as obscure, Chatow pointed out that Nuance supports VoiceXML 2.0, a recognized standard, and not SALT, which is still making its way through the standards process. Furthermore, the Nuance product isn't tied to Microsoft technologies, but also works with Java application servers.
Nuance on Tuesday plans to announce the third major release of its Nuance Voice Platform product. Release 3.0 adds support for Linux in addition to Windows and Solaris and a new application design and deployment environment that promises to cut development costs by about a third, the company said in a statement.
While in terms of acquisition cost the Nuance Voice Platform may be more expensive than Microsoft's Speech Server, the Microsoft offering may end up costing more and paying back for itself later because of technology upgrades, development quirks and other costs associated with setting up and running the product, Chatow said.
Gartner's Cramoysan said that while Microsoft does plan to offer its product at a lower price, that indeed does not mean it will work out to cost less over a longer timeframe. "Although Microsoft is talking about fairly aggressive pricing, it is an unproven product. We would caution people in terms of assuming that it would be lower cost in terms of total cost of ownership," Cramoysan said.
About 600 customers participated in Microsoft's Speech Server beta test and 30 in an early adopter program. One early adopter is Seattle-based Grange Insurance Group, which with the help of a consultancy company developed a system its customers can call to check the status of their payments.
Grange uses Microsoft software across its business, except for its policy management system, which runs on an IBM mainframe. Going to Microsoft for speech recognition was a clear decision, said Ralph Carlile, chief information officer and vice president of technology at Grange.
"I didn't see any technology out there that I was interested in going after. Other products had high failure rates and did not offer integration into our back-end systems," he said.
Development of the speech recognition components only took two to three weeks, Carlile said. The company did have some issues getting a telephony board for the server and hooking that up to its phone system, he said. But now Grange is testing the speech applications with 750 of its policy holders and the results are good, he said.
Pricing for Microsoft's Speech Server products will be "an order of magnitude lower" than competing products, Mastan said. Details will be announced next week. Yankee Group's Schoeller in his research note predicted Microsoft will undercut the competition by about 30 percent.
Microsoft will offer free 180-day trial versions of its Speech Server software, which will initially only be available in U.S. English. General availability of the software is expected to be a few weeks after launch.