Abstract

It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.

Highlights

  • In clinical practice there is a great demand for fast and reliable methods for assessing the communication efficiency of a person with a speech disorder

  • The first subsystem generated 55 phonemic features (PMF-tri) originating from acoustic scores computed by state-of-theart triphone acoustic models in the mel-frequency cepstral coefficients (MFCC) feature space

  • The speaker features of the two subsystems could be combined before they were supplied to the intelligibility prediction model

Read more

Summary

Introduction

In clinical practice there is a great demand for fast and reliable methods for assessing the communication efficiency of a person with a (pathological) speech disorder. The outcome of our experiments was that the correlations between the perceptual and the computed scores were only moderate [13] This is inline with our expectations since the ASR employs acoustic models that were trained on the speech of nonpathological speakers. We formerly developed an initial version of our system [13], and we were able to demonstrate that its computed intelligibilities correlated well with perceived phone-level intelligibilities [14] for our speech material These good correlations could only be attained with a system incorporating two distinct ASR components: one working directly in the acoustic feature space and one working in the phonological feature space. The paper ends with a conclusion and some directions for future work

Perceptual Test and Evaluation Database
An Automatic Intelligibility Measurement System
Speaker Feature Extraction
Results and Discussion
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call