In an ideal scenario, the engine's database would contain every possible combination of sounds that can be produced by a human voice--a goal that would be nearly impossible to achieve. Instead, the software looks for a series of best matches, stringing them together into a final audio stream. In some cases, such as with nonstandard or foreign words, this may be very hard to do, leading to incorrect results. "There are always things that the synthesizer has to actually synthesize--for example numbers or rarely used words," says Handsome's Zafarnia. "The former are not too difficult, [but the latter] are more difficult and have to be created [artificially]," often resulting in unusual or incorrect pronunciation.
Almost like the real thing
Making Siri talk requires the contribution of many different experts, from actors to engineers to voice specialists. And even with the best technology currently available, the occasional slurred word or mispronounced name is inevitable.
Still, despite their ever-increasing accuracy, synthesized voices are no substitute for the real thing. "The human voice is the most dynamic instrument we know of, so one doesn't have to listen very closely to hear a lack of characteristic inflection and other qualities," stresses actor Scott Reyns, adding that "when emotion, engaging and compelling an audience, telling a story, or getting a message across that sells counts, companies hire the real thing: actual humans."