Abstract

Millions of Americans suffer from vocal damage that can negatively impact their daily lives and potentially incur large health care costs. Vocal damage is found frequently in at-risk populations, including singers, teachers, coaches, and telemarketers. The current standard method of diagnosis involves performing a laryngoscopy. Current research has shown that digital signal processing with acoustic machine learning methods can be used to distinguish healthy from unhealthy voices, but there has been limited prior work in classifying different pathology types from one another using only acoustic machine learning and signal processing methods. This study aims to design a convolution neural network (CNN) algorithm that can differentiate between different vocal pathologies by using an audio data file of the vowel sound /a/ of damaged voices as inputs. The audio dataset of different vocal pathologies has been obtained from the Indiana University Vocal Pathology Dataset. For the CNN algorithm, images are required, and continuous wavelet transforms (CWT) are calculated that show spectral energy across time in a visual format. The overall results will show the accuracy for two class classifications of different vocal pathologies and how accuracy changes when the different parameters of the CNN architecture are changed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call