Abstract

Automatic speech recognition (ASR) could potentially improve communication by providing transcriptions of speech in real time. ASR is particularly useful for people with progressive disorders that lead to reduced speech intelligibility or difficulties performing motor tasks. ASR services are usually trained on healthy speech and may not be optimized for impaired speech, creating a barrier for accessing augmented assistance devices. We tested the performance of three state-of-the-art ASR platforms on two groups of people with neurodegenerative disease and healthy controls. We further examined individual differences that may explain errors in ASR services within groups, such as age and sex. Speakers were recorded while reading a standard text. Speech was elicited from individuals with multiple sclerosis, Friedreich’s ataxia, and healthy controls. Recordings were manually transcribed and compared to ASR transcriptions using Amazon Web Services, Google Cloud, and IBM Watson. Accuracy was measured as the proportion of words that were correctly classified. ASR accuracy was higher for controls than clinical groups, and higher for multiple sclerosis compared to Friedreich’s ataxia for all ASR services. Amazon Web Services and Google Cloud yielded higher accuracy than IBM Watson. ASR accuracy decreased with increased disease duration. Age and sex did not significantly affect ASR accuracy. ASR faces challenges for people with neuromuscular disorders. Until improvements are made in recognizing less intelligible speech, the true value of ASR for people requiring augmented assistance devices and alternative communication remains unrealized. We suggest potential methods to improve ASR for those with impaired speech.

Highlights

  • Automatic speech recognition (ASR) systems help digital machines interpret spoken speech and automate human tasks, such as typing text and web searches

  • These results suggest that the ASR services have greater difficulty transcribing consecutive words regardless of whether speech is impaired

  • As speech recognition accuracy was lower for Friedreich’s ataxia (FA) compared to multiple sclerosis (MS), results suggest that the severity of the disease type influenced accuracy

Read more

Summary

Introduction

Automatic speech recognition (ASR) systems help digital machines interpret spoken speech and automate human tasks, such as typing text and web searches. ASR services may, lose their proficiency with impaired speech due to their underrepresentation in training datasets This leads to increased errors in ASR for dysarthric speech (De Russis & Corno, 2019; Mengistu & Rudzicz, 2011; Rosen & Yampolsky, 2000; Young & Mihailidis, 2010). Some studies have examined ASR accuracy using databases of dysarthric speech (e.g., the TORGO database) but do not differentiate between causes of dysarthria (e.g., cerebral palsy and amyotrophic lateral sclerosis) which may produce different speech recognition errors (De Russis & Corno, 2019; Mengistu & Rudzicz, 2011). Speech in FA is characterized by reduced pitch variation, reduced loudness control, impaired timing, strained voice quality, reduced breath support, hypernasality, and imprecise production of consonants (Folker et al, 2010; Poole et al, 2015; Vogel et al, 2017)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call