Speech recognition applications hold a great deal of promise for radiology. Ideally, they decrease report turnaround time, improve radiologist efficiency, and reduce costs. In practice, however, achieving these goals has met with uneven success.
"In order to improve efficiency, speech recognition must fulfill the following requirements: it must have a low error rate; it must be user friendly and well accepted by radiologists; and it must not distract the radiologist from the task of reading images," said Dr. Julia Crim.
Crim, an associate professor of radiology at the University of Utah in Salt Lake City, delivered a primer on maximizing the capabilities of speech recognition software at the 2003 RSNA conference in Chicago.
For Crim, there’s no question that speech recognition is preferable to traditional dictation/transcription-based reporting.
"On a personal level I’ve seen no fall in hourly productivity with speech recognition, I have the highest productivity in my department, and I’ve experienced improved clinician satisfaction as a result of using the application," she said.
There are some disadvantages to speech recognition compared to traditional transcription-based reporting, she said.
"Speech recognition cannot understand the context of the dictation. It doesn’t delete the word "uh," and it makes more errors than your best transcriptionist," she observed.
Speech recognition training generally takes place in two steps. The first is training conducted by the application vendor. The second is performed by radiologists who train the system on their own.
Training days
For the second phase of training, Crim advocates that at first the radiologist speak naturally, but slowly. After a few days, the application will recognize the radiologist’s intonation and speech patterns, allowing reports to be dictated at a normal pace.
"The software cannot understand a monotone well, because it loses important clues to recognize phrases. And speaking in phrases trains the system to understand words as they sound in a sentence," she said.
To improve number recognition, Crim suggested saying the word "numeral" as a preface to a number. For dates, she suggested interposing the numbers with the word "slash." And for dictating time, she elects to say the time followed by either "a.m." or "p.m."
If the software has a correction mode for words or phrases, Crim supports its use as a crucial part of the training process.
Phrasing
"As you speak, the speech engine is calculating the probability that you will use any three words together. When you correct a phrase, the computer has a more distinctive combination of sounds to add to its memory than it does if you correct a single word. Also, when you speak a phrase, the computer learns the sound of the word used in context rather than its sound in isolation," she said.
In speech recognition technology, any two or more words that are consistently used together are treated as a "bound phrase." As part of the application training process, Crim entered any bound phrases with which the application had difficulty recognizing as words in the speech recognition vocabulary. For example, she entered the phrases "bare area," "Weber type B," "involving the basis," and "line of mechanical axis" as bound phrases in her vocabulary.
Macros -- a set of instructions that are recorded, saved, and assigned to a short code -- and structured reporting -- where information is standardized and presented in a clear, organized format -- are two tools that Crim uses to increase her efficiency.
Crim creates her speech recognition macros around three areas: standard normals; technique only, such as a hip injection; and standard abnormals. She recommends having easily remembered names for the macros, such as "pelvis trauma" or "female abdomen CT," as it is more efficient to invoke the macro by voice command than by accessing and inserting it via a pull-down menu.
Form and structure
She has also used structured reporting to great advantage in her practice. Each element of the structured report, such as history, study date, procedure, findings, and impressions, is given its own line space. She has inserted default wording for a normal scan in brackets so it can be left in the report as is or overwritten to describe abnormalities.
"Structured reporting improves the readability and consistency of reports and decreases the chances for errors in speech recognition. It also allows mistakes to be caught more easily than they would be in a long paragraph. In addition, it’s preferred by the referring clinicians," she said.
Adapting her speech patterns is another technique that Crim uses to get the most from her speech recognition software. For example, to alleviate the confusion in the application between ileum and ilium, she now dictates by saying "iliac bone."
In order for speech recognition to increase productivity, a radiologist needs to maximize the time spent looking at images and minimize the time spent looking at dictation.
Crim uses the following six-step process for her reading workflow with speech recognition:
- Bring the patient into the dictation system (either via a bar code or PACS).
- Perform an initial evaluation of the study.
- Select a macro template for dictation.
- Dictate while continuing to review the study.
- Check the voice recognition before signing off.
- Fax the report as needed to referring physicians.
"If you’re looking at the dictation screen, this severely impacts your thought processes and efficiency. Train yourself not to peek at the report until a major portion of it is finished," she suggested.
By Jonathan S. BatchelorAuntMinnie.com staff writer
February 4, 2004
Related Reading
Adding the human touch to speech recognition, August 8, 2003
RIS/PACS integration brings productivity, efficiency gains, July 22, 2003
Voice recognition reduces report turnaround time in Vienna facility, December 5, 2002
Speech recognition, structured reporting eyed for efficiency gains, March 16, 2002
Speech recognition users need patience, training to achieve optimal results, June 15, 2001
Copyright © 2004 AuntMinnie.com