Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features

Saska Tirronen,Sudarsana Reddy Kadiri,Paavo Alku

doi:10.1109/ojsp.2023.3242862

Saska Tirronen, Sudarsana Reddy Kadiri + Show 1 more

Open Access

https://doi.org/10.1109/ojsp.2023.3242862

Copy DOI

Journal: IEEE Open Journal of Signal Processing	Publication Date: Jan 1, 2023
Citations: 12	License type: CC BY 4.0

Affiliation: Aalto University

Abstract

Previous studies on the automatic classification of voice disorders have mostly investigated the binary classification task, which aims to distinguish pathological voice from healthy voice. Using multi-class classifiers, however, more fine-grained identification of voice disorders can be achieved, which is more helpful for clinical practitioners. Unfortunately, there is little publicly available training data for many voice disorders, which lowers the classification performance on data from unseen speakers. Earlier studies have shown that the usage of glottal source features can reduce data redundancy in detection of laryngeal voice disorders. Another approach to tackle the problems caused by scarcity of training data is to utilize deep learning models, such as wav2vec 2.0 and HuBERT, that have been pre-trained on larger databases. Since the aforementioned approaches have not been thoroughly studied in the multi-class classification of voice disorders, they will be jointly studied in the present work. In addition, we study a hierarchical classifier, which enables task-wise feature optimization and more efficient utilization of data. In this work, the aforementioned three approaches are compared with traditional mel frequency cepstral coefficient (MFCC) features and one-vs-rest and one-vs-one SVM classifiers. The results in a 3-class classification problem between healthy voice and two laryngeal disorders (hyperfunctional dysphonia and vocal fold paresis) indicate that all the studied methods outperform the baselines. The best performance was achieved by using features from wav2vec 2.0 LARGE together with hierarchical classification. The balanced classification accuracy of the system was 62.77% for male speakers, and 55.36% for female speakers, which outperformed the baseline systems by an absolute improvement of 15.76% and 6.95% for male and female speakers, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features

Abstract

Talk to us

Similar Papers

More From: IEEE Open Journal of Signal Processing

Lead the way for us

Similar Papers

Classification of functional dysphonia using the tunable Q wavelet transform
Kiran Reddy Mittapalle ... Paavo Alku
Speech Communication | VOL. 155
Kiran Reddy Mittapalle, et. al.Kiran Reddy Mittapalle ... Paavo Alku
06 Oct 2023
Speech Communication | VOL. 155

Classification of voice disorders using i-Vector analysis
Kshipra Naikare ... Nikunj Lad
-
Kshipra Naikare, et. al.Kshipra Naikare ... Nikunj Lad
01 Feb 2018
01 Feb 2018

Voice Disorders and their Management
-
-
--
01 Jan 1991
01 Jan 1991

Classification of Healthy and Pathological voices using MFCC and ANN
Smitha ... Sarika Hegde
-
Smitha, et. al. Smitha ... Sarika Hegde
01 Feb 2018
01 Feb 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features

Abstract

Talk to us

Similar Papers

More From: IEEE Open Journal of Signal Processing