This study presents an automated noninvasive voice disorder detection and classification approach using an optimized fusion of modified glottal source estimation and deep transfer learning neural network descriptors. A new set of modified descriptors based on a glottal source estimator and pre-trained Inception-ResNet-v2 convolutional neural network-based features are proposed for the speech disorder detection and classification task. The modified feature set is obtained using mel-cepstral coefficients, harmonic model, phase discrimination means, distortion deviation descriptors, conventional wavelet, and glottal source estimation features. Early descriptor-level fusion is employed in this study for performance enhancement-however, the fusion results in higher feature vector dimensionality. A nature-inspired slime mould algorithm is utilized to remove redundant and select the best discriminating features. Finally, the classification is performed using the K-nearest neighbor (KNN) classifier. The proposed algorithm was evaluated using extensive experiments with different feature combinations, with and without feature selection, and with two popular datasets: the Arabic Voice Pathology Database (AVPD) and the Saarbrucken Voice Database (SVD). We show that the proposed optimized fusion method attained an enhanced voice pathology detection accuracy of 98.46%, encompassing a wide spectrum of voice disorders on the SVD database. Furthermore, compared to traditional handcrafted and deep neural network-based techniques, the proposed method demonstrates competitive performance with fewer features.
Read full abstract