Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models

A. Paats,I. Fridolin,T. Alumäe,E. Meister

doi:10.1007/s10278-018-0085-8

A. Paats, I. Fridolin + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/s10278-018-0085-8

Copy DOI

Export

Save

Cite

Journal: Journal of Digital Imaging	Publication Date: Apr 30, 2018
Citations: 8	License type: open-access

Affiliation: Tallinn University of Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The aim of this study was to analyze retrospectively the influence of different acoustic and language models in order to determine the most important effects to the clinical performance of an Estonian language-based non-commercial radiology-oriented automatic speech recognition (ASR) system. An ASR system was developed for Estonian language in radiology domain by utilizing open-source software components (Kaldi toolkit, Thrax). The ASR system was trained with the real radiology text reports and dictations collected during development phases. The final version of the ASR system was tested by 11 radiologists who dictated 219 reports in total, in spontaneous manner in a real clinical environment. The audio files collected in the final phase were used to measure the performance of different versions of the ASR system retrospectively. ASR system versions were evaluated by word error rate (WER) for each speaker and modality and by WER difference for the first and the last version of the ASR system. Total average WER for the final version throughout all material was improved from 18.4% of the first version (v1) to 5.8% of the last (v8) version which corresponds to relative improvement of 68.5%. WER improvement was strongly related to modality and radiologist. In summary, the performance of the final ASR system version was close to optimal, delivering similar results to all modalities and being independent on user, the complexity of the radiology reports, user experience, and speech characteristics.

Highlights

In the modern healthcare system, computers and electronic healthcare records are used extensively
Feedback was collected from daily automatic speech recognition (ASR) system users [7] and the ASR system characteristics were modified in order to minimize errors through the enhancement of the acoustic models (AM) or language model (LM) (Table 1) reported in detail earlier [12]
In the first version (v1) of the ASR system with Gaussian mixture model (GMM) acoustic model and language model trained on 1-year reports, the total word error rate (WER) was 18.4% (SD 18.6)

Summary

Introduction

In the modern healthcare system, computers and electronic healthcare records are used extensively. Radiology is the most computerized specialty and a pioneer among other clinical fields using diagnostic workstations for image interpretation and radiology information systems for documenting findings. Demographic changes, including population aging, increase the demand for healthcare services. This trend has influenced radiology, where, for example, in Estonia, the number of radiology procedures, carried out between 2010 and 2015, To be able to fulfill all patient needs with limited resources, there is a necessity for radiologists to find a way for more effective image reporting. In Estonia, radiologists manually type results of visual findings and quantitative measurements of a study as a textual report. Automatic speech recognition (ASR) has shown to be a valid alternative, replacing the traditional keyboard-based text entry in radiology reporting which can improve patient care and resource management in the form of reduced report turnaround times, reduced staffing needs, and the efficient completion and distribution of reports [2, 3]

Objectives

Methods

Results

Conclusion