Abstract

Perceptual evaluation of the patient’s voice is the most commonly used method in everyday clinical practice. We propose an automatic approach for the prediction of severity of some types of organic and functional dysphonia. By means of an unsupervised learning method, we have demonstrated that acoustic parameters measured on different phonetic classes are suitable for modelling the four grade assessments of the specialists (RBH subjective scale from 0 to 3). In this study, the overall hoarseness H was examined. Four specialists were asked to determine the severity of dysphonia. A k-means cluster analysis was performed for the decision of each specialist separately; the average accuracy of the four-grade classification was 0.46. The four-grade classification has been surprisingly close to the subjective judgements. Moreover, automatic estimation of severity of dysphonia was also determined. Linear regression and RBF kernel regression models were compared. The average rating of the four specialists were used as target in the experiments. Low RMSE and high correlation measures were obtained between the automatically predicted severity and perceptual assessments. The best RMS value of H was 0.45 for the model with RBF kernel, however, a simpler linear model provided the highest correlation value of 0.85, using only eight acoustic parameters.

Highlights

  • Dysphonia refers to the dysfunction in the ability to produce voice

  • In Tulics and Vicsi (2017) we demonstrated that these parameters correlate with the severity of dysphonia, as well as Soft Phonation Index (SPI) and Empirical mode decomposition (EMD) based frequency band ratios acoustic parameters measured on different phonetic classes

  • Soft Phonation Index (SPI) and Empirical mode decomposition (EMD) based frequency band ratios were measured on the voiced parts of speech, and the measured parameter were grouped into different phonetic classes

Read more

Summary

Introduction

Dysphonia refers to the dysfunction in the ability to produce voice. Perceptually, dysphonia can be characterized by hoarse, breathy, harsh or rough vocal qualities, but some kind of phonation remains (Hirschberg et al 2013). Acoustic measures are derived from sustained vowel samples; continuous speech has several advantages over analysis of sustained vowels It contains a variation of fundamental frequency, pauses and phonation onsets, and there is the opportunity to examine different variations of speech sounds. The most widely used acoustic parameters regarding dysphonia include: jitter, shimmer and Harmonics-to-Noise Ratio (HNR) Zhang and his colleagues in Zhang and Jiang (2008) found that jitter and shimmer statistically differentiate between normal and pathological sustained vowels but did not show such a significant difference between normal and pathological continuous speech. Our previous research has confirmed that acoustic parameters like jitter, shimmer, HNR and the first component (c1) of the mel-frequency cepstral coefficients (referred to as ‘mfcc01’) are useful in the automatic classification of healthy and pathological voices using continuous speech (Vicsi et al 2011; Kazinczi et al 2015; Grygiel et al 2012).

Methods and materials
Pathological and healthy adults speech database
Recording environment and text material
Initial database
Selected database
RBH scale
Acoustic parameters
Decision methods
Two‐class classification results
Unsupervised cluster analysis
Reliability analysis
Regression analysis
Conclusion and discussion
Two‐class classification and parameter selection
Clustering
Regression
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call