Speech recognition turns in 90% accuracy rate

Apr 1, 2001

A continuous speech recognition program was found to be 90% accurate, although the figure varied slightly for non-native English speakers, according to a study published in the latest issue of the Journal of Digital Imaging.

Researchers from the departments of radiology at the University of Washington in Seattle and the Mayo Clinic in Rochester, MN, evaluated the performance of a single continuous speech recognition software product (IBM MedSpeak/Radiology, version 1.1). The team randomly selected a test set of two reports from six imaging modalities.

After completing the enrollment process for the software, each speaker (three men and three women) dictated the 12 reports in random order. The speakers were advised to use their normal rate and volume, and to avoid watching the computer monitor while dictating, according to the researchers (Journal of Digital Imaging, March 2001, Vol.14:1, pp 30-37).

Each transcribed report was compared automatically with the original report text using a software program for text document comparison (DocuComp II, Advanced Software, Sunnyvale, CA). Once the reports were compared, the researchers classified each discrepancy as one of four error types:

Class 0: Errors involved no change in meaning with respect to the original report text, and the transcribed text was grammatically correct.

Class 1: Errors involved no change in meaning, but the transcribed text was grammatically incorrect.

Class 2: Errors in which the meaning of the transcribed report text was different than that of the original report text, but the error was judged to be obvious.

Class 3: Errors involved a change in meaning as compared with the original report text, but the error was judged not to be obvious.

The overall error rate of the transcribed reports was 10.3%. The rate of significant errors (class 2 or 3) was 7.8%, with an average of 7 significant errors per 87-word report. Subtle significant errors (class 3) were found in 1.2% of cases, or on average, about 1 error per 87-word report. Of the 84 total reports transcribed in the study, 50 had one or more subtle significant errors.

"The efficiency of the proofreading process necessary for error detection would be much improved if an indication of the confidence of the speech recognition algorithm, in its interpretation of the audio, could be provided by the program," the researchers wrote. "This could be indicated using color-coded highlighting of the report text. The radiologist could rapidly skim areas of high confidence, and focus mainly on areas of low confidence."

Statistically significant differences were observed in accuracy between native and non-native English speakers, although the accuracy values were similar, according to the study team. The researchers found no statistically significant differences between the male and female speaker groups, or between different report lengths.

"Practical implementation issues (rather than recognition accuracy) currently limit the widespread routine use of the system in radiology, although niche applications (e.g. after-hours emergency room interpretation) likely will benefit from use of the system," the researchers wrote. "It is expected that the use of continuous speech recognition systems interfaced with an RIS, and used along with PACS, will remove the major practical impediments to routine applications."

By Erik L. Ridley
AuntMinnie.com staff writer
April 2, 2001
Related Reading

SCAR expands program for 2001 meeting, March 20, 2001

Voice recognition provides speed, cost-benefit advantages, November 29, 2000

Voice recognition systems can be worth the extra work, October 3, 2000

A guide to multiculturalism in the radiology workplace, December 4, 1999

Click here to post your comments about this story. Please include the headline of the article in your message.