Abstract

Automatic speech recognition (ASR) is currently used in many assistive technologies, such as helping individuals with speech impairment in their communication ability. One challenge in ASR for speech-impaired individuals is the difficulty in obtaining a good speech database of impaired speakers for building an effective speech acoustic model. Because there are very few existing databases of impaired speech, which are also limited in size, the obvious solution to build a speech acoustic model of impaired speech is by employing adaptation techniques. However, issues that have not been addressed in existing studies in the area of adaptation for speech impairment are as follows: (1) identifying the most effective adaptation technique for impaired speech; and (2) the use of suitable source models to build an effective impaired-speech acoustic model. This research investigates the above-mentioned two issues on dysarthria, a type of speech impairment affecting millions of people. We applied both unimpaired and impaired speech as the source model with well-known adaptation techniques like the maximum likelihood linear regression (MLLR) and the constrained-MLLR(C-MLLR). The recognition accuracy of each impaired speech acoustic model is measured in terms of word error rate (WER), with further assessments, including phoneme insertion, substitution and deletion rates. Unimpaired speech when combined with limited high-quality speech-impaired data improves performance of ASR systems in recognising severely impaired dysarthric speech. The C-MLLR adaptation technique was also found to be better than MLLR in recognising mildly and moderately impaired speech based on the statistical analysis of the WER. It was found that phoneme substitution was the biggest contributing factor in WER in dysarthric speech for all levels of severity. The results show that the speech acoustic models derived from suitable adaptation techniques improve the performance of ASR systems in recognising impaired speech with limited adaptation data.

Highlights

  • Speech is second nature for most of us, to the extent that we cannot imagine how life would be like without it, as speech communication is a vital skill in our society

  • The constrained MLLR (CMLLR) technique shows a lower word error rate (WER) than the maximum likelihood linear regression (MLLR) technique for both the TIMIT and TORGO adapted model

  • For the TIMIT and TORGO databases, we found that there is a significant difference in WER for MLLR and CMLLR

Read more

Summary

Introduction

Speech is second nature for most of us, to the extent that we cannot imagine how life would be like without it, as speech communication is a vital skill in our society. Inability to communicate verbally is a serious disability that can drastically affect a person’s life. Speech impairment deprives a person of communicating with others, and severe speech impairment can be frustrating for both sufferers and listeners. Several studies show that about 60% of individuals with speech impairments have difficulties in communicating orally with others; such disability severely affects their social life [1]. Some sufferers can learn and make sound judgments, but, due to their poor speaking ability, they have difficulties in communicating with others; this condition affects their ability in learning and restricts their chances of gaining a proper education

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.