Abstract

Dysarthria is a frequently occurring motor speech disorder which can be caused by neurological trauma, cerebral palsy, or degenerative neurological diseases. Because dysarthria affects phonation, articulation, and prosody, spoken communication of dysarthric speakers gets seriously restricted, affecting their quality of life and confidence. Assistive technology has led to the development of speech applications to improve the spoken communication of dysarthric speakers. In this field, this paper presents an approach to improve the accuracy of HMM-based speech recognition systems. Because phonatory dysfunction is a main characteristic of dysarthric speech, the phonemes of a dysarthric speaker are affected at different levels. Thus, the approach consists in finding the most suitable type of HMM topology (Bakis, Ergodic) for each phoneme in the speaker's phonetic repertoire. The topology is further refined with a suitable number of states and Gaussian mixture components for acoustic modelling. This represents a difference when compared with studies where a single topology is assumed for all phonemes. Finding the suitable parameters (topology and mixtures components) is performed with a Genetic Algorithm (GA). Experiments with a well-known dysarthric speech database showed statistically significant improvements of the proposed approach when compared with the single topology approach, even for speakers with severe dysarthria.

Highlights

  • The term dysarthria was initially defined as “a collective name for a group of speech disorders resulting from disturbances in muscular control over the speech mechanism due to damage of the central or peripheral nervous system” [1, 2]

  • Dysarthria is described as an impairment in one or more of the processes involved in speech production: respiration, phonation, resonance, articulation, and prosody [3]

  • The damage of the nervous system that leads to dysarthria can be caused by congenital disorders, cerebrovascular accident (CVA), traumatic brain injury (TBI), or degenerative neurological disease such as Parkinson’s or Alzheimer’s disease

Read more

Summary

Introduction

The term dysarthria was initially defined as “a collective name for a group of speech disorders resulting from disturbances in muscular control over the speech mechanism due to damage of the central or peripheral nervous system” [1, 2]. ASR technologies are focused to identify (recognize) more accurately the sentences spoken by the dysarthric speaker independently of the severity of the dysarthria This is very important for the development of applications (as those described above) which have the objective of improving communication and interaction with other people or other assistive systems. This information was integrated into the ASR process to correct those confusion errors (deletion, substitution, and/or insertion of phonemes) and provide a more accurate response [18, 26, 36,37,38] This approach performed better than other approaches that made use of speaker adaptation techniques (as those used by commercial ASR systems) because, as commented in [39], these are insufficient to deal with the abnormalities present in dysarthric speech.

HMM Parameters for Optimization
Optimization Method
Experiments on Dysarthric Speech
Findings
Discussion and Future
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call