Multicondition Training for Noise-Robust Detection of Benign Vocal Fold Lesions From Recorded Speech

Mario Madruga,Yolanda Campos-Roca,Carlos J Perez

doi:10.1109/access.2020.3046873

Abstract

This study evaluates the effects of Multicondition Training (MCT) on computer aided diagnosis systems for voice quality assessment associated to exudative lesions of Reinke’s space. This technique adds various noise conditions to the speech recordings in order to recreate realistic acoustic environments. Four different databases (Massachussets Eye and Ear Infirmary, UEX-Voice, Saarbrucken, and Hospital Universitario Principe de Asturias) recorded in very different acoustic environments are used. We compare the outcomes of random forest classifier models comprising feature selection, hyperparameter tuning, and cross-validation attending the specific MCT schema used to separate healthy from pathological subjects for three diseases (nodules, polyps, and Reinke’s edema). Apart from the clean case baseline, an asymmetric (one subject recording is affected only by one noise recording) and two symmetric (one subject recording is affected by all the noise recordings) noise-based MCT scenarios are considered. These scenarios are created by adding realistic acoustic noise of different types to the sustained /a/ vowel recordings. The symmetric approaches are affected by methodological concerns and are tested with a comparative purpose, to emphasize these issues. Experimental results highlight the drawbacks of symmetric MCTs and exclude these techniques as a viable option. In contrast, asymmetric MCT is proven to be a suitable noise-robust approach to build a diagnosis system for exudative lesions of Reinke’s space, as performance obtained with the resulting classifiers is not far from the performance obtained for clean training.

Highlights

Human voice production can be affected by a wide range of conditions, either vocal specific like nodules, polyps, cleft lip and palate, or by other disorders which affect motor control like neurodegenerative diseases
It is for such that Computer Aided Diagnosis (CAD) tools are of great interest since they can help diagnosis procedures by using voice recordings as a noninvasive biomarker
All vocal recordings were processed in the same way: First, all samples were trimmed down to 1 second length in order to ensure homogeneous length across databases; later, all of them were downsampled to 16 kHz prior corruption in order to match noise files sampling rate; after that, noise was added from all sources at all proposed Signalto-Noise Ratio (SNR); preprocessing was applied to the sound files prior feature extraction, normalizing amplitude to range [−1, 1]; and lastly, feature extraction was performed for each recording

Summary

Introduction

Human voice production can be affected by a wide range of conditions, either vocal specific like nodules, polyps, cleft lip and palate, or by other disorders which affect motor control like neurodegenerative diseases. Polyps, and Reinke’s edema are the main lesions that occur in Reinke’s space [1] Their etiologic factors are different, their pathologic features are quite similar and diagnosis usually relies on the clinical description of. Classical voice quality assessment relies on cumbersome techniques such as videostroboscopy or laryngoscopy, procedures which are highly invasive and uncomfortable for patients, and require expensive equipment and expert practitioners. It is for such that Computer Aided Diagnosis (CAD) tools are of great interest since they can help diagnosis procedures by using voice recordings as a noninvasive biomarker. They are non-intrusive as they only perform signal processing of voice samples [2]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Dec 23, 2020
Citations: 58	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multicondition Training for Noise-Robust Detection of Benign Vocal Fold Lesions From Recorded Speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Enhancing noise robustness of automatic Parkinson’s disease detection in diadochokinesis tests using multicondition training
Mario Madruga Escalona ... Carlos Javier Pérez Sánchez
Expert Systems With Applications | VOL. 260
Mario Madruga Escalona, et. al.Mario Madruga Escalona ... Carlos Javier Pérez Sánchez
18 Sep 2024
Expert Systems With Applications | VOL. 260

Estimation of Speech Intelligibility Using Speech Recognition Systems
Yusuke Takano ... Kazuhiro Kondo
IEICE Transactions on Information and Systems | VOL. E93-D
Yusuke Takano, et. al.Yusuke Takano ... Kazuhiro Kondo
01 Jan 2009
IEICE Transactions on Information and Systems | VOL. E93-D

Multi-Condition Training for Unknown Environment Adaptation in Robust ASR Under Real Conditions
J Rajnoha
Acta Polytechnica | VOL. 49
J RajnohaJ Rajnoha
02 Jan 2009
Acta Polytechnica | VOL. 49

Assessing Outcomes in Facial Reanimation: Evaluation and Validation of the SMILE System for Measuring Lip Excursion During Smiling
Dominic Bray ... Tessa A Hadlock
Archives of Facial Plastic Surgery | VOL. 12
Dominic Bray, et. al.Dominic Bray ... Tessa A Hadlock
01 Sep 2010
Archives of Facial Plastic Surgery | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multicondition Training for Noise-Robust Detection of Benign Vocal Fold Lesions From Recorded Speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access