The effect of pauses in dysarthric speech recognition study on Thai cerebral palsy children

Supawat Suanpirintr,Nuttakorn Thubthong

doi:10.1145/1328491.1328530

Abstract

Dysarthric speech recognition (DSR) is continuously developed to improve the quality of life of people with speech impairment. This study aimed to investigate the effect of pauses in DSR. Speech corpus consists of 40 words including two subsets, (i) 20 bisyllabic words with specific design in order to contain all types of final consonant-initial consonant junction in Thai language and (ii) 20 monosyllabic words, which have some phoneme similar to that of the previous subset. Four cerebral palsy children with dysarthria and two normal children were participated. DSR was trained by using Hidden Markov Models (HMMs) in 3 approaches: phoneme-based (PSR), word-based (WSR), and pause reducing word-based (PRWSR). For the third approach, the pauses in words were automatically detected and reduced. The accuracy for PRWSR was compared with that of WSR by varying the duration of remaining pauses in PRWSR. Speech samples from the normal children were also recognized for comparing the accuracy. The results showed that PSR provided the highest recognition rate. The recognition rates of WSR and PRWSR are not significantly different but PRWSR grants a bit higher recognition rate than WSR. Comparing the remaining pause duration, 100 ms remaining pause duration is better than any other duration.

Full Text