Acoustical Assessment of Voice Disorder With Continuous Speech Using ASR Posterior Features

Yuanyuan Liu,Kathy Yuet-Sheung Lee,Thomas Law,Tan Lee

doi:10.1109/taslp.2019.2905778

Abstract

Traditionally acoustical assessment of voice disorder relies on simple and homogeneous speech samples like sustained vowels. Continuous speech is believed to be more representative of the daily function of voice and more preferable in clinical practice. This paper describes an attempt on automating voice assessment with continuous speech utterances. The proposed system makes use of a novel type of features that are derived from phone posterior probabilities outputted by a deep neural network based automatic speech recognition ASR system. These ASR-based voice features are designed to effectively quantify the mismatch between disordered voice and normal voice. Prediction of voice disorder severity is carried out first at utterance-level and subsequently the prediction scores for individual utterances from a subject are combined to give an overallassessment on the subject. With a low-dimension ASR-based feature vector, the utterance-level prediction accuracy is comparable to that with conventional features with a much higher dimension. By jointly using the ASR features and conventional voice features, a subject-level prediction accuracy of over $\text{80}{\%}$ on three severity classes can be achieved. Subjects with mild disorder and those with severe disorder could be perfectly distinguished by the proposed method.

Full Text