The Effect of Noise on Deep Learning for Classification of Pathological Voice.

Koki Hasebe,Yo Kishimoto,Yoshitaka Kawai,Shintaro Fujimura,Keiichi Tamura,Tsuyoshi Kojima,Koichi Omori

doi:10.1002/lary.31303

Abstract

This study aimed to evaluate the significance of background noise in machine learning models assessing the GRBAS scale for voice disorders. A dataset of 1406 voice samples was collected from retrospective data, and a 5-layer 1D convolutional neural network (CNN) model was constructed using TensorFlow. The dataset was divided into training, validation, and test data. Gaussian noise was added to test samples at various intensities to assess the model's noise resilience. The model's performance was evaluated using accuracy, F1 score, and quadratic weighted Cohen's kappa score. The model's performance on the GRBAS scale generally declined with increasing noise intensities. For the G scale, accuracy dropped from 70.9% (original) to 8.5% (at the highest noise), F1 score from 69.2% to 1.3%, and Cohen's kappa from 0.679 to 0.0. Similar declines were observed for the remaining RBAS components. The model's performance was affected by background noise, with substantial decreases in evaluation metrics as noise levels intensified. Future research should explore noise-tolerant techniques, such as data augmentation, to improve the model's noise resilience in real-world settings. This study evaluates a machine learning model using a single dataset without comparative controls. Given its non-comparative design and specific focus, it aligns with Level 4 evidence (Case-series) under the 2011 OCEBM guidelines Laryngoscope, 134:3537-3541, 2024.

Full Text