Abstract

Post-stroke dysarthria (PSD) is a common and persistent sequela of stroke. To assist objective assessment of dysarthria, the pathological voice recognition technology has emerged. But most existing methods lacks pathological voices to be analyzed and has not high recognition rate. To solve above problems, this paper proposes a hybrid recognition model combining 1DCNN and Double-LSTM (DLSTM) networks based on the MFCC features of pathological voices. First, a syllable pronunciations’ database collected from Mandarin-speaking participants including normal adults (NA) and patients with PSD is constructed. Then, a 1DCNN network is applied to process the MFCC features of syllable pronunciations and extract the deeper hidden features. Then, by using the above implied features, a recognition model based on DLSTM network is constructed to realize the syllable-level classification. Finally, the aggregation technique is combined to determine whether the speaker belongs to the category of PSD in speaker-level. The experimental results show that the further processing of MFCC features by 1DCNN can enhance significantly the performance of the DLSTM-based recognition model with the accuracy of 82.1% and 97.4% in syllable-level and speaker-level. In the case of deep learning networks, the performance of MFCC-based models is better than those spectrogram-based models; and the combination of 1DCNN and DLSTM cannot enhance the performance of spectrograms-based models. Thus, our method can improve the efficiency of the treatment of speech disorders and effectively assist the diagnosis of PSD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call