Abstract

In recent years, deep learning methods in automatic pathological voice detection (APVD) have gained satisfying results. However, most deep learning methods in APVD cannot explain their performance. Interpretability is crucial in deep learning methods applied to the medical field. A lack of interpretability makes it hard for existing methods to give better generalization performance than meaningful feature-based methods in practical applications. This paper proposed an interpretable neural network architecture called the Interpretable Multi-band Feature Extraction Network (IMBFN) based on clear feature extraction logic and a comprehensive result judgment method to improve the effectiveness and generalization performance of APVD. An amplitude-trainable SincNet (AT-SincNet) filter bank was put forward in IMBFN and applied as the front-end frequency division network. In addition, IMBFN used a designed two-path one-dimensional depthwise separatable convolutional neural network (CNN)-based feature extractor to extract meaningful voice features. The classification results of each voice frame were used to judge whether the voice was pathological synthetically. Comparative experiments were conducted using data from the MEEI, SVD, and HUPA databases. The best improvement of accuracy, F1-score, and Matthews correlation coefficient (MCC) reached 0.1705, 0.1977, and 0.4463, respectively. Also, blind tests were carried out in participants from the First Affiliated Hospital of Soochow University, and an accuracy, F1-score, and MCC of 0.7594, 0.8491, and 0.2981, respectively, were obtained. Results demonstrated that IMBFN provided meaningful explanations, good APVD effect, and better generalization performance than existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call