The automatic assessment of the severity of dysphonia

Miklós Gábriel Tulics,Klára Vicsi

doi:10.1007/s10772-019-09592-y

Miklós Gábriel Tulics, Klára Vicsi

Open Access

PDF Available

https://doi.org/10.1007/s10772-019-09592-y

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Perceptual evaluation of the patient’s voice is the most commonly used method in everyday clinical practice. We propose an automatic approach for the prediction of severity of some types of organic and functional dysphonia. By means of an unsupervised learning method, we have demonstrated that acoustic parameters measured on different phonetic classes are suitable for modelling the four grade assessments of the specialists (RBH subjective scale from 0 to 3). In this study, the overall hoarseness H was examined. Four specialists were asked to determine the severity of dysphonia. A k-means cluster analysis was performed for the decision of each specialist separately; the average accuracy of the four-grade classification was 0.46. The four-grade classification has been surprisingly close to the subjective judgements. Moreover, automatic estimation of severity of dysphonia was also determined. Linear regression and RBF kernel regression models were compared. The average rating of the four specialists were used as target in the experiments. Low RMSE and high correlation measures were obtained between the automatically predicted severity and perceptual assessments. The best RMS value of H was 0.45 for the model with RBF kernel, however, a simpler linear model provided the highest correlation value of 0.85, using only eight acoustic parameters.

Highlights

Dysphonia refers to the dysfunction in the ability to produce voice
In Tulics and Vicsi (2017) we demonstrated that these parameters correlate with the severity of dysphonia, as well as Soft Phonation Index (SPI) and Empirical mode decomposition (EMD) based frequency band ratios acoustic parameters measured on different phonetic classes
Soft Phonation Index (SPI) and Empirical mode decomposition (EMD) based frequency band ratios were measured on the voiced parts of speech, and the measured parameter were grouped into different phonetic classes

Summary

Introduction

Dysphonia refers to the dysfunction in the ability to produce voice. Perceptually, dysphonia can be characterized by hoarse, breathy, harsh or rough vocal qualities, but some kind of phonation remains (Hirschberg et al 2013). Acoustic measures are derived from sustained vowel samples; continuous speech has several advantages over analysis of sustained vowels It contains a variation of fundamental frequency, pauses and phonation onsets, and there is the opportunity to examine different variations of speech sounds. The most widely used acoustic parameters regarding dysphonia include: jitter, shimmer and Harmonics-to-Noise Ratio (HNR) Zhang and his colleagues in Zhang and Jiang (2008) found that jitter and shimmer statistically differentiate between normal and pathological sustained vowels but did not show such a significant difference between normal and pathological continuous speech. Our previous research has confirmed that acoustic parameters like jitter, shimmer, HNR and the first component (c1) of the mel-frequency cepstral coefficients (referred to as ‘mfcc01’) are useful in the automatic classification of healthy and pathological voices using continuous speech (Vicsi et al 2011; Kazinczi et al 2015; Grygiel et al 2012).

Methods and materials

Pathological and healthy adults speech database

Recording environment and text material

Initial database

Selected database

RBH scale

Acoustic parameters

Decision methods

Two‐class classification results

Unsupervised cluster analysis

Reliability analysis

Regression analysis

Conclusion and discussion

Two‐class classification and parameter selection

Clustering

Regression

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Speech Technology	Publication Date: Mar 11, 2019
Citations: 9	License type: open-access

R Discovery Prime

The automatic assessment of the severity of dysphonia

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Speech Technology

Lead the way for us

Similar Papers

Phonetic-class based correlation analysis for severity of dysphonia
Miklos Gabriel Tulics ... Klara Vicsi
-
Miklos Gabriel Tulics, et. al.Miklos Gabriel Tulics ... Klara Vicsi
01 Sep 2017
01 Sep 2017

The Performance of the Kernel Regression Model for Assessing the Impact of Money Supply on Industrial Growth in Nigeria
Uka C O ... Aronu C O
International Journal of Advances in Scientific Research and Engineering | VOL. 4
Uka C O, et. al.Uka C O ... Aronu C O
01 Jul 2018
International Journal of Advances in Scientific Research and Engineering | VOL. 4

Usefulness of Direct Magnitude Estimation (DME) and Acoustic Analysis in Measuring Dysphonia Severity
Yeon Woo Lee ... Geun Hyo Kim
Journal of Voice | VOL. -
Yeon Woo Lee, et. al.Yeon Woo Lee ... Geun Hyo Kim
01 Aug 2024
Journal of Voice | VOL. -

Application of support vector regression to genome-assisted prediction of quantitative traits
Nanye Long ... Daniel Gianola
Theoretical and Applied Genetics | VOL. 123
Nanye Long, et. al.Nanye Long ... Daniel Gianola
08 Jul 2011
Theoretical and Applied Genetics | VOL. 123

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

The automatic assessment of the severity of dysphonia

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Speech Technology