Abstract
The human voice is an essential communication tool, but various disorders and habits can disrupt it. Diagnosis of pathological and abnormal voices is very important. Conventional diagnosis of these voice pathologies can be invasive and costly. Voice pathology disorders can be effectively detected using Artificial Intelligence and computer-aided voice pathology classification tools. Previous studies focused primarily on binary classification, leaving limited attention to multi-class classification. This study proposes three different neural network architectures to investigate the feature characteristics of three voice pathologies-Hyperkinetic Dysphonia, Hypokinetic Dysphonia, Reflux Laryngitis, and healthy voices using multi-class classification and the Voice ICar fEDerico II (VOICED) dataset. The study proposes UNet++ autoencoder-based denoiser techniques for accurate feature extraction to overcome noisy data. The architectures include a Multi-Layer Perceptron (MLP) trained on structured feature sets, a Short-Time Fourier Transform (STFT) model, and a Mel-Frequency Cepstral Coefficients (MFCC) model. The MLP model on 143 features achieved 97.1% accuracy, while the STFT model showed similar performance with increased sensitivity of 99.8%. The MFCC model maintained 97.1% accuracy but with a smaller model size and improved accuracy on the Reflux Laryngitis class. The study identifies crucial features through saliency analysis and reveals that detecting voice abnormalities requires the identification of regions of inaudible high-pitch sounds. Additionally, the study highlights the challenges posed by limited and disjointed pathological voice databases and proposes solutions for enhancing the performance of voice abnormality classification. Overall, the study’s findings have potential applications in clinical applications and specialized audio-capturing tools.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.