Abstract

Speech production assessment in disordered speech relies on tests such as intelligibility and/or comprehensibility tests. These tests are subjective and time-consuming for both the patients and the practitioners. In this paper, we report on the use of automatically-derived pronunciation scores to predict comprehensibility ratings, on a pilot development corpus comprised of 120 utterances recorded by 12 speakers with distinct pathologies. We found high correlation values (0.81) between Goodness Of Pronunciation (GOP) scores and comprehensibility ratings. We compare the use of a baseline implementation of the GOP algorithmwith a variant called forced-GOP, which showed better results. A linear regression model allowed to predict comprehensibility scores with a 20.9% relative error, compared to the reference scores given by two expert judges. A correlation value of 0.74 was obtained between both the manual and the predicted scores. Most of the prediction errors concern the speakers who have the most extreme ratings (the lowest or the largest values), showing that the predicted score range was globally more limited than the one of the manual scores due to the simplicity of the model.

Highlights

  • The assessment of speech production abilities in motor speech disorders relies almost exclusively on subjective tests such as intelligibility tests

  • This study aims at verifying the ability of Goodness Of Pronunciation (GOP) measures to predict disordered speech comprehensibility

  • Mean reaction time (RT) tend to increase with mean GOP and Forced-aligned GOP (F-GOP) scores, whereas mean comprehensibility appears to decrease as a function of GOP and F-GOP

Read more

Summary

Introduction

The assessment of speech production abilities in motor speech disorders relies almost exclusively on subjective tests such as intelligibility tests They are very time-consuming and often imply subjective judgments: speakers read lists of words or sentences while one or several judge(s) evaluate their production. Within this framework automatic methods for speakers evaluation appear as practical alternatives. ASR techniques developed for the assessment of foreign language learners’ pronunciation skills focused both on the segmental and the suprasegmental levels, giving birth to two research fields respectively called individual error detection and overall pronunciation assessment [1]. For individual error detection (i.e., automatic detection of mispronounced phones), two kinds of methods are used:

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.