Phonetic Analysis of Dysarthric Speech Tempo and Applications to Robust Personalised Dysarthric Speech Recognition

Feifei Xiong,Heidi Christensen,Jon Barker

doi:10.1109/icassp.2019.8683091

Abstract

Improving the accuracy of personalised speech recognition for speakers with dysarthria is a challenging research field. In this paper, we explore an approach that non-linearly modifies speech tempo to reduce mismatch between typical and atypical speech. Speech tempo analysis at the phonetic level is accomplished using a forced-alignment process from traditional GMM-HMM in automatic speech recognition (ASR). Estimated tempo adjustments are applied directly to the acoustic features rather than to the time-domain signals. Two approaches are considered: i) adjusting dysarthric speech towards typical speech for input into ASR systems trained with typical speech, and ii) adjusting typical speech towards dysarthric speech for data augmentation in personalised dysarthric ASR training. Experimental results show that the latter strategy with data augmentation is more effective, resulting in a nearly 7% absolute improvement in comparison to baseline speaker-dependent trained system evaluated using UASpeech corpus. Consistent recognition performance improvements are observed across speakers, with greatest benefit in cases of moderate and severe dysarthria.

Full Text