Voice pathology identification system using a deep learning approach based on unique feature selection sets

Nuha Qais Abdulmajeed,Mazin Abed Mohammed,Belal Al‐Khateeb

doi:10.1111/exsy.13327

Abstract

AbstractVoice pathology diagnosis requires extracting significant features from voice signals, and classical machine learning models can overfit to the training data, which can cause difficult issues and pose challenges. The study aimed to develop a reliable and efficient system for identifying voice pathologies utilizing the long short‐term memory (LSTM) method. The study combined unique feature sets such as the mel frequency cepstral coefficients (MFCCs), zero crossing rate (ZCR), and mel spectrograms, which have not been used together in previous works. Voice pathology identification improved the accuracy rate using the LSTM approach on the Saarbruecken voice database (SVD) samples. The best results achieved by the proposed system showed an accuracy rate of 99.3% for /u/ vowel samples in neutral pitch, 99.2% for /a/ vowel samples in high pitch, 99% for /i/ vowel samples in neutral pitch, and 99.2% for sentence samples. The experimental results were evaluated utilizing accuracy, precision, specificity, sensitivity, and F1 measures. Additionally, the study compared the performance of LSTM with that of artificial neural networks (ANNs) and found that LSTM achieved better outcomes.

Full Text