Microsoft preps universal translator easier to carry than in Dr. Who or Star Trek

Translation carries words and tone of voice, translates signs, text, digital media

Every fan of science fiction movies – even Star Wars – has moments of emotional conflict over how badly even the best SF movies mangle the laws of physics, or even basic reality.

Even the original Star Trek – among the more physical-law-abiding TV series – had the occasional obvious gaffe.

[Even during the '60s, how many people really believed that humanity would be wise enough to miss the opportunity for nuclear holocaust to become a self-consciously ethical, scientifically explorers of deep space whose judgment remained poor enough to wear that much velour?]

Of the very many SF movie tropes that demonstrate reality and violate it at the same time ("space fighters" can't swoop or make cool ray-gun noises audible through a vacuum), the hardest to exterminate is the universal translator. Earth audiences refuse to read subtitles, but only speak Earth languages (except a few self-taught speakers of Klingon, Elvish and Fanboid).

So Dr. Who's TARDIS magically projects language translation through time, space and all the sound stages of the BBC.

Star Trek lets its computer do the work, but waits a decent interval for it to learn the language of a new species.

Microsoft, which has been building speech-recognition and automatic translators for a couple of decades, demonstrated last week the result of its effort to avoid both TARDIS- and mainframish disembodied "Computer"-dependent translation by designing voice-recognizing translators to run on virtually any handy electronics.

Microsoft Research showed off the Microsoft Translating Telephone, its latest high water mark at last week's TechFest 2012 – a showcase for the bleedingest-edge of the future of Microsoft feature glut.

The technology is a lot better, a lot lighter and a lot faster than the days it ran only on the most powerful computer clusters and even its own developers called speech-to-text the "wrecked a nice beach" technology because it couldn't reliable recognize speech.

The big step forward is a text-to-speech apps able to run on smartphones, translate spoken words into many different languages and keep the speaker's own voice and intonation largely intact while doing it.

The system, which is designed to translate both spoken words and written text on road signs and other instructions that are vital gibberish to monolingual travelers, has to be trained for an hour or so to get the intonations and vocal qualities of one speaker correct, according to Microsoft Principal Researcher Frank Soong of Microsoft's Beijing research operation, which developed the latest speech app.

In demonstrations at TechFest, Soong translated the recorded words of Microsoft honcho Craig Mundie into Mandarin without losing the recognizable qualities of Mundie's voice.

He also translated Microsoft Research boss Rick Rashid from English into Spanish, Italian and Mandarin.

The ability to handle many languages on many devices for spoken words, digital content and text on paper, signs or other physical media is part of an overall effort at Microsoft Research to mix physical and virtual worlds to make each simpler.

Other demonstrations showed the potential benefit of controlling a PC using arm gestures picked up by the web cam/motion-capture capability the Xbox Kinect uses for gaming, for example.

Using a concept called "sensor fusion" Microsoft Researchers demonstrated the ability to combine location data from a mobile phone and a Kinect web cam to allow computers attached to the Kinect to apply graphics to the actual location of the phone – to make it a ping-pong paddle or presentation pointer without the need to use a specialized devices.

Communicating in multiple media and several languages with both other humans and machines is a "tectonic shift in our relationship with technology," according to a Microsoft Research blog entry.

Augmented reality and controls that cross both real and virtual worlds are technically difficult, expensive to develop and only theoretically profitable in the future of whatever vendor builds a really effective one.

Microsoft, always a fast-follower rather than an innovator of new technology, has consistently poured far more time and money into technology that can do more than wreck a nice beach. Its work has helped develop a few decent product features and raise the average technical capability for both itself and its competitors by making abstruse technology common in Windows.

It has rarely hit a financial or technological home run despite success in multi-touch computer screens, high-performing flight simulators and other demanding technologies.

The popularity of Kinect is an exception. The multilanguage speech-recognition engine demo'd at TechFest may be another.

There is a far larger potential market for that level of translation than there is for a video game add-on that lets you physically swing a virtual golf club, especially considering how difficult it is for even people speaking the same language to understand one another.

Unfortunately, none of the demonstrations showed whether it is practical to translate Republican to Democrat or back again, so even if multi-language multi-reality translation becomes a huge hit, it may never master the difficult, reality-warping speech patterns of Capitol Hill.

So far there is no simple translation technology capable of handling doublespeak.

Read more of Kevin Fogarty's CoreIT blog and follow the latest IT news at ITworld. Follow Kevin on Twitter at @KevinFogarty. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

What’s wrong? The new clean desk test
Join the discussion
Be the first to comment on this article. Our Commenting Policies