Recognizing Dysarthric Speech due to Amyotrophic Lateral Sclerosis with Across-Speaker Articulatory Normalization

Seongjun Hahm,Jun Wang,Daragh Heitzman

doi:10.18653/v1/w15-5109

Abstract

Recent dysarthric speech recognition studies using mixed data from a collection of neurological diseases suggested articulatory data can help to improve the speech recognition performance. This project was specifically designed for the speakerindependent recognition of dysarthric speech due to amyotrophic lateral sclerosis (ALS) using articulatory data. In this paper, we investigated three across-speaker normalization approaches in acoustic, articulatory, and both spaces: Procrustes matching (a physiological approach in articulatory space), vocal tract length normalization (a data-driven approach in acoustic space), and feature space maximum likelihood linear regression (a model-based approach for both spaces), to address the issue of high degree of variation of articulation across different speakers. A preliminary ALS data set was collected and used to evaluate the approaches. Two recognizers, Gaussian mixture model (GMM) - hidden Markov model (HMM) and deep neural network (DNN) - HMM, were used. Experimental results showed adding articulatory data significantly reduced the phoneme error rates (PERs) using any or combined normalization approaches. DNN-HMM outperformed GMM-HMM in all configurations. The best performance (30.7% PER) was obtained by triphone DNN-HMM + acoustic and articulatory data + all three normalization approaches, a 15.3% absolute PER reduction from the baseline using triphone GMM-HMM + acoustic data. Index Terms: Dysarthric speech recognition, Procrustes matching, vocal track length normalization, fMLLR, hidden Markov models, deep neural network

Highlights

Automatic speech recognition (ASR) technologies have been commercially available for healthy talkers, these technologies did not perform satisfactorily well when directly used for talkers with dysarthria, a motor speech disorder due to neurological or other injury [1]
These results suggest that Vocal tract length normalization (VTLN), Procrustes matching, and feature space maximum likelihood linear regression (fMLLR) were all effective for speaker-independent dysarthric speech recognition from acoustic data, articulatory data, or combined
This paper investigated speaker-independent dysarthric speech recognition using the data from patients with amyotrophic lateral sclerosis (ALS) and with three across-speaker normalization approaches: a physiological approach, Procrustes matching, a data-driven approach, VTLN, and a model-based approach, fMLLR

Summary

Introduction

Automatic speech recognition (ASR) technologies have been commercially available for healthy talkers, these technologies did not perform satisfactorily well when directly used for talkers with dysarthria, a motor speech disorder due to neurological or other injury [1]. Recent studies indicated Procrustes matching was effective for speaker-independent silent speech recognition (i.e., recognizing speech from articulatory data only) [18, 19]. We investigated the use of 1) articulatory data as additional information source for speech, 2) Procrustes matching, VTLN, and fMLLR as feature normalization approaches individually or combined, 3) two machine learning classifiers, GMM-HMM and DNN-HMM. The effectiveness of these speaker-independent dysarthric speech recognition approaches were evaluated with a preliminary data collected from multiple early diagnosed ALS patients

Data Collection

Participants and stimuli

Tongue motion tracking device - Wave

Procedure

Data processing

Procrustes matching: A physiological approach for articulatory data

Vocal tract length normalization: A data-driven approach for acoustic data

Combination of normalization approaches

Recognizer and experimental setup

Results & Discussion

Limitations

Conclusions & Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Recognizing Dysarthric Speech due to Amyotrophic Lateral Sclerosis with Across-Speaker Articulatory Normalization

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2015
Citations: 55	License type: cc-by

Similar Papers

Determining an Optimal Set of Flesh Points on Tongue, Lips, and Jaw for Continuous Silent Speech Recognition
Jun Wang ... Ted Mau
-
Jun Wang, et. al.Jun Wang ... Ted Mau
01 Jan 2015
01 Jan 2015

Deep neural network training for whispered speech recognition using small databases and generative model sampling
Shabnam Ghaffarzadegan ... Hynek Bořil
International Journal of Speech Technology | VOL. 20
Shabnam Ghaffarzadegan, et. al.Shabnam Ghaffarzadegan ... Hynek Bořil
26 Oct 2017
International Journal of Speech Technology | VOL. 20

Deep neural network architectures for dysarthric speech analysis and recognition
Brahim Fares Zaidi ... Malika Boudraa
Neural Computing & Applications | VOL. 33
Brahim Fares Zaidi, et. al.Brahim Fares Zaidi ... Malika Boudraa
09 Jan 2021
Neural Computing & Applications | VOL. 33

Articulatory speech recognition using hierarchical, interpolating hidden Markov models
Daniel C Fain ... Alan H Barr
The Journal of The Acoustical Society of America | VOL. 104
Daniel C Fain, et. al.Daniel C Fain ... Alan H Barr
01 Sep 1998
The Journal of The Acoustical Society of America | VOL. 104

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recognizing Dysarthric Speech due to Amyotrophic Lateral Sclerosis with Across-Speaker Articulatory Normalization

Abstract

Highlights

Summary

Talk to us

Similar Papers