Abstract

Traditionally acoustical assessment of voice disorder relies on simple and homogeneous speech samples like sustained vowels. Continuous speech is believed to be more representative of the daily function of voice and more preferable in clinical practice. This paper describes an attempt on automating voice assessment with continuous speech utterances. The proposed system makes use of a novel type of features that are derived from phone posterior probabilities outputted by a deep neural network based automatic speech recognition ASR system. These ASR-based voice features are designed to effectively quantify the mismatch between disordered voice and normal voice. Prediction of voice disorder severity is carried out first at utterance-level and subsequently the prediction scores for individual utterances from a subject are combined to give an overallassessment on the subject. With a low-dimension ASR-based feature vector, the utterance-level prediction accuracy is comparable to that with conventional features with a much higher dimension. By jointly using the ASR features and conventional voice features, a subject-level prediction accuracy of over $\text{80}{\%}$ on three severity classes can be achieved. Subjects with mild disorder and those with severe disorder could be perfectly distinguished by the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.