Abstract

A preprocessing scheme based on linear prediction coefficient (LPC) residual is applied to higher-order statistics (HOSs) for automatic assessment of an overall pathological voice quality. The normalized skewness and kurtosis are estimated from the LPC residual and show statistically meaningful distributions to characterize the pathological voice quality. 83 voice samples of the sustained vowel /a/ phonation are used in this study and are independently assessed by a speech and language therapist (SALT) according to the grade of the severity of dysphonia of GRBAS scale. These are used to train and test classification and regression tree (CART). The best result is obtained using an optima l decision tree implemented by a combination of the normalized skewness and kurtosis, with an accuracy of 92.9%. It is concluded that the method can be used as an assessment tool, providing a valuable aid to the SALT during clinical evaluation of an overall pathological voice quality.

Highlights

  • Pathological voice quality assessment has attracted attention for many years, inducing a large amount of research based on acoustical, aerodynamic, and physiological measurements [1,2,3,4,5,6]

  • We propose a novel scheme of pathological voice quality measurement, higher-order statistics (HOSs)

  • By using the information of the multiple parameters extracted from pathological and normal voice, the classification and regression tree (CART) makes a final decision whether the current phonation is normal, a slight, a moderate, or a severe pathological voice

Read more

Summary

Introduction

Pathological voice quality assessment has attracted attention for many years, inducing a large amount of research based on acoustical, aerodynamic, and physiological measurements [1,2,3,4,5,6]. Our goal is to assess an overall pathological voice quality which is scored on a four-point scale: 0 = normal, 1 = mild deviance, 2 = moderate deviance, and 3 = severe deviance. Gu et al suggested three objective quality assessment measures such as Itakura-Saito (IS) distortion, log-likelihood ratio (LLR), and log-area-ratio (LAR). The IS measure was suggested to be more suitable than LLR and LAR for use as a reliable tool to evaluate an overall quality of disordered speech [1]. The best result was obtained using 21 input parameters, for which an accuracy of 92% was achieved [4]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call