Speech recognition has been a positive force in improving the efficiency of clinical documentation in limited practice areas like radiology. Giga Information Group projects the speech recognition software market is expected to grow from $100 million in 2000 to $2.5 billion by 2005, with healthcare representing a significant piece of the pie.

As speech systems begin to spread slowly across the healthcare enterprise, a major opportunity exists to extend the technology for the benefit of the other key stakeholder in this productivity equation: transcriptionists.

Due to the continuing shortage of transcriptionists, cost concerns and the persistent increase in amounts of documentation required, pressure is intensifying to limit or even eliminate the use of transcriptionists. Giga estimates that hospitals spend $1,200 or more per month per physician on transcription, and the American Association for Medical Transcription estimates that $20 billion is spent annually on medical transcription services.

In cases when a physician’s encounter with the patient is especially brief or routine, it may be viable to reduce reliance on transcriptionists. Speech recognition, combined with enhancements like templates, structured notes, menu-driven forms and other tools, will allow physicians to complete reports with just a few utterances and a couple of taps of the stylus on a PDA, thus bypassing transcription altogether. At least, that’s the vision.

As the volume of reports continues to climb, hospitals will strive to keep transcription costs low, typically by deploying productivity-enhancing transcription platforms, employing techniques such as incentive pay for transcriptionists–directly or through outsourcing–and enabling transcriptionists to work from home.

Another approach worth exploring is to leverage speech technology for transcriptionists, thus increasing their productivity. By extending the reach of speech technology into the business process of transcription, hospitals can further streamline transcription workflow–after some change in user behavior–advance patient care through improved report turnaround, and impact the payment cycle by facilitating timely coding and increasing free cash flow.

This perspective of clinical documentation is becoming more common, as budget-conscious hospitals replace aging stand-alone dictation systems with integrated solutions designed to increase efficiency throughout the entire clinical documentation lifecycle.

Speech Recognition Innovations

Speech recognition has come in two approaches: front-end, in which the physician directly sees the recognized text on a PC, and back-end, in which the system completes speech recognition and sends recognized text together with the original voice file to an editor for corrections.

With front-end systems, users need to adjust their normal dictation pattern to optimize for higher recognition accuracy. As users dictate, they can view and edit the document as it appears.

Several factors have enabled front-end systems to gain a foothold in radiology. Since radiologists use a relatively small vocabulary of about 50,000 words, the burden to correct and train front-end speech systems for radiology is less. This, in turn, limits the amount of mistakes generated with each report and minimizes disruption to workflow. Radiologists also generate large amounts of dictation per physician, thus providing an easy return on investment (ROI) justification for the dollars spent in acquiring the technology–typically from $5,000 to $10,000 per license.

This environment has not been without challenges. Most physicians are not inclined to devote the hours necessary to train and customize the system to recognize and interpret voices to the necessary level of accuracy. This is understandable, given the rigorous deadline pressures and powerful financial drivers that radiologists have to contend with to maintain a high throughput of reports. Besides the high licensing fee, the software has to be integrated with a hospital’s ADT feed and a document management system, and customized to include hospital-specific voice-activated templates, physician-specific normals (preset blocks of formatted text for repetitive dictations) and physician training. Many hospitals concerned with near-term ROI and changes in physician behavior have stayed on the sidelines.

Since cost and workflow issues block wider acceptance of front-end speech systems, a different approach has emerged: back-end speech recognition. With back-end systems, users dictate at their normal pace into a phone, PC microphone or handheld device; the text is generated behind the scenes on the “back end.” Once dictation is completed, the voice file is processed by the system’s speech recognition engine to generate a text file that the medical editor or transcriptionist edits.

The key advantage to back-end systems is that they allow users to maintain their customary workflow without changing their dictation pattern to accommodate the technology. Physicians can go back to seeing patients without diverting time to clean up dictations. Meanwhile, the speech engine will compare the original text against the edited version provided by the transcriptionist and “remember” the corrections, thus improving recognition accuracy over time.

Recent Changes in Technology

Just as the first iterations of speech recognition technology were designed to make document generation more cost-efficient, so, too, have recent improvements in the technology:

* the integration of comprehensive vocabulary sets for various healthcare specialties;

* more sophisticated engines that can compensate for dead air or distinguish between ambient sounds and voices;

* the incorporation of structured text and templates for documents; and

* the ability to accommodate more input devices.

While the emphasis of back-end speech recognition technologies has been not to alter physician behavior, transcriptionists have had to adapt. The more than 200,000 transcriptionists in the U.S. have trained themselves, over many years, to listen to dictation through their ears and type using their fingers. The back-end speech recognition paradigm requires them to listen to the dictation through their ears, watch recognized text through their eyes and type corrections using their fingers.

Furthermore. there is no integration of process or technology to aid transcriptionists as they toggle back and forth between text files that are generated by speech recognition systems and recordings produced by traditional phone-based dictation.

Improving Workflow

A number of technology improvements hold the potential to allow transcriptionists to reap the benefits of speech technology, while easing the transition from transcriptionist to editor:

Voice and text bookmarking. Bookmarking speech recognition-generated text to match corresponding sections of dictation adds an extra measure of efficiency to the editor’s workflow. An editor now can edit a particular part of the document and hear the corresponding dictation without searching for it using a foot pedal.

Highlighting low-recognition-confidence sections. By presenting editors with multimedia documents that automatically highlight the sections of documents that have a level of recognition confidence below a certain predetermined percentage, editors can directly transcribe such sections without changing their behavior, while accepting the rest of the recognized document as is.

IP telephony. As hospitals upgrade their voice infrastructure to adopt IP telephones such as the Cisco 7900 series, voice can be captured at a higher quality (16-KHz, 16-bit samples, instead of 8-KHz, 8-bit samples with current telephony technology), resulting in improved accuracy for speech recognition with no change in physician behavior. This would also reduce the editing load for transcriptionists.

Saving the original. Storing the original dictated voice as part of the electronic medical record (indexed by all the same metainformation as the text report) is another way healthcare organizations are gaining more and more comfort with speech-recognized documents. It is going to become mandatory that the electronic chart of the future is a multimedia document, not just a replacement of current paper-based information.

Building in productivity tools like dictionaries, normals and word expanders tied to specialty disciplines, which automatically offer to insert a complete word when the first few letters of it are typed, also enhances efficiency.

Of course, transcriptionists will have to adapt their habits to work with reports generated by speech recognition. Changing a deeply ingrained behavior and then ramping up a new behavior to a productive level will be a critical step for transcriptionists to thrive in the new document-creation paradigm.