December 01, 2012, 7:03 AM — You decide what you want to say. You say it. The words appear on the screen.
Forget the frustrating months it took you to learn typing. In fact, you can forget that writing involves any particular effort. Today's powerful, multi-core computers, combined with the latest speech recognition software and a good microphone, can produce results that are, frankly, startling.
The technology has gotten so good, in fact, that the weak link in the system appears to be the user's ability to dictate. While this may sound like a trivial point, dictation turns out to be a distinct skill that involves factors that are not intuitive. But once the skill is mastered, keyboarding seems painfully primitive.
Dragon NaturallySpeaking corrects a dictated sentence from Shakespeare's Hamlet: The word "town" is changed to "tongue." In this case the correct alternative is second on the list and can be designated by saying "Choose two."
While newer speech recognition mobile apps such as Siri and Google Now have grabbed most of the headlines, one of the longest-running and most well-known speech recognition software packages is Dragon NaturallySpeaking from Nuance.
There are a variety of versions available. For this review, I tried out Dragon NaturallySpeaking 12 Premium for Windows PCs, available for $199.99. Other versions include a Home Edition for $99.99, which does not integrate with spreadsheets or support off-line dictation and has no playback facility; a Professional Edition with enterprise-level administrative, customization, and multi-user features for $599.99; and a similar Legal Edition with a law office vocabulary, also for $599.99. There is a version for the Mac called Dragon Dictate ($199.99), along with specialized Mac products for legal and medical workers.
A bit of background: I'm not new to speech recognition. In fact, I've been using PC-based speech recognition on and off for nearly two decades to alleviate the stresses of keyboarding. At first, speech recognition packages were more like frustrating toys with maddening limitations, but they have steadily improved over time.
The crossover point was probably NaturallySpeaking version 8 in 2004, when the utility of speech recognition finally outweighed its limitations. But limitations remained: speech recognition was still more reliable with long words than with short ones (making it popular with doctors); misinterpreted words were often rendered as commands with random and startling results (Bill Gates himself was the victim of this at a live demo in 2006); the software's demand on the hardware was nontrivial (so that switching between documents could be painfully slow); and the software could get confused to the point that it stopped listening.
The skill of dictation
Here are some tips you can follow that will make your use of voice recognition software easier and more effective:
- Enunciate carefully and speak slowly enough so that each word gets its due (although you don't have to go too slow). Remember, you are controlling a machine, not talking to a person.
- While speaking, envision the text you are seeking to produce. This will help you give equal heed to each word (so the computer can too), keep a steady rhythm and suppress "dysfluencies" like, ah, y'know.
- Watch the results on the screen as you go along. This may slow you down but will enhance your accuracy. To paraphrase Wyatt Earp: It's good to be fast, but it's better to be accurate.
- Even a momentary loss of focus can lead to misrecognition, especially of one-syllable words. But if you can maintain focus, the results can be far more accurate than typing.
- A big issue for novices is that they have learned to "think with their fingers," so suddenly removing the keyboard is a major impediment to composition. I have found it best to just speak the text as it comes to you without stopping for mistakes. You can edit it later.
- Finally, there is the environment. Background silence is best, but droning ventilators hurt recognition more than office chatter. Meanwhile, if you don't mind being overheard on the phone then you won't mind being overheard while dictating -- otherwise, find an office. You can use about the same volume for the phone and for speech recognition.
But with version 12, these factors have faded into the background (although they they haven't entirely disappeared). For example, you can dictate effectively at about half the speed of an auctioneer -- should you prove able to do so. Assuming that you stay focused while dictating, the error rate is now trivial (see sidebar).
An important part of that new reliability is the noise canceling headset microphone supplied with the software, which does not react to background noise. It made things a lot easier for me -- I had to turn off my previous microphones every time I stopped speaking to keep them from picking up other sounds. The Home and Premium versions come with a two-speaker analog headset, while the Professional and Legal versions come with a one-speaker USB headset.
Version 12 is outwardly not very different from previous versions, with the same interface and basic command scheme. The vendor claims that accuracy out-of-the-box is 20% better than that of version 11, and in my testing, that did seem to be the case. New features include an interactive tutorial, Bluetooth support, and enhanced support for Gmail and Hotmail.
Dragon installs from a CD; during the installation, it asks a number of questions about your age, gender and accent. (It also tests the microphone, and in my case was not happy until I had tried several ports.) It then listens to your voice during a short training session, taking about five minutes. (With early versions the training took easily 45 minutes.) You have the option to let it examine your document folders and outgoing email folders to look for commonly used words.
When invoked, Dragon puts a thin control bar across the top of the screen. You click an icon in this control bar to turn on the microphone. When you start to talk, text appears at the cursor. If you talk quickly, the text may fall as much as a sentence behind, but I found it invariably caught up fairly quickly. Punctuation marks must be pronounced.
If word X is misrecognized, you can adjust the software by saying "Correct X." Word X will then be selected and Dragon will present a list of possible corrections. If none of them match, you can spell the desired word. Thereafter, Dragon is more likely to recognize the word correctly. (With version 12, I found that one correction was always enough.)