Different Performances of Machine Learning Models to Classify Dysphonic and Non-Dysphonic Voices

Danilo Rangel Arruda Leite,Ronei Marcos De Moraes,Leonardo Wanderley Lopes

doi:10.1016/j.jvoice.2022.11.001

Danilo Rangel Arruda Leite, Ronei Marcos De Moraes + Show 1 more

https://doi.org/10.1016/j.jvoice.2022.11.001

Copy DOI

Abstract

To analyze the performance of 10 different machine learning (ML) classifiers for discrimination between dysphonic and non-dysphonic voices, using a variance threshold as a method for the selection and reduction of acoustic measurements used in the classifier. We analyzed 435 samples of individuals (337 female and 98 male), with a mean age of 41.07 ± 13.73 years, of which 384 were dysphonic and 51 were non-dysphonic. From the sustained /ε/ vowel sample, 34 acoustic measurements were extracted, including traditional perturbation and noise measurements, cepstral/spectral measurements, and measurements based on nonlinear models. The variance method was used to select the best set of acoustic measurements. We tested the performance of the best-selected set with 10 ML classifiers using precision, sensitivity, specificity, accuracy, and F1-Score measurements. The kappa coefficient was used to verify the reproducibility between the two datasets (training and testing). The naive Bayes (NB) and stochastic gradient descent classifier (SGDC) models performed best in terms of accuracy, AUC, sensitivity, and specificity for a reduced dataset of 15 acoustic measures compared to the full dataset of 34 acoustic measures. SGDC and NB obtained the best performance results, with an accuracy of 0.91 and 0.76, respectively. These two classifiers presented moderate agreement, with a Kappa of 0.57 (SGDC) and 0.45 (NB). Among the tested models, the NB and SGDC models performed better in discriminating between dysphonic and non-dysphonic voices from a set of 15 acoustic measures.

Full Text