Abstract
Vocal disorders are potentially an underreported condition due to invasive diagnostic methods and a lack of general awareness. Traditionally, vocal damage is diagnosed through laryngoscopies, but recent research supports using audio processing and machine learning to distinguish between not only healthy and unhealthy voices but also between different vocal pathologies. This study introduces a framework for classifying multiple vocal pathologies—dysphonia, polyps, nodules, and paralysis, based on previous research focused on two-class classification. The current approach employs continuous wavelet transforms (CWT) and spectrograms from pathological voices, /AH/ and /EE/ audio files from the Indiana University (IU) Health Voice Center, as inputs for a convolutional neural network (CNN) classifier. The images from the wavelet transform achieve higher accuracy than spectrograms for both two-class and multiclass classification. Various wavelet shapes and sizes are compared for accuracy rates and processing time. Different signal-to-noise ratios and augmentation techniques are tested for classification robustness and accuracy. The study also compares male and female voices on classification accuracy and how parameters, like learning rates and filter layers, affect CNN performance. Results indicate that accuracy improves with a 0.0001 learning rate and optimal image sizes, demonstrating the potential for enhancing classification models through precise parameter tuning.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have