Abstract
This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.
Highlights
The automatic detection of speech disabilities has attracted significant clinical and academic attention, with the hope of accurately diagnosing speech impairments before they are identified by well-trained experts and expensive equipment
Many researchers focus on acoustic analysis, parametric and nonparametric feature extraction, and the automatic detection of speech pathology using pattern recognition algorithms and statistical methods [1,2,3,4], pathological voice detection studies using deep learning techniques have been actively published recently
This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of pathological speech using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), as well as higher-order statistics (HOSs) parameters
Summary
The automatic detection of speech disabilities has attracted significant clinical and academic attention, with the hope of accurately diagnosing speech impairments before they are identified by well-trained experts and expensive equipment. The main motivation for realizing this work is the use of artificial intelligence to diagnose various diseases This can lead to significant improvements in diagnosis and healthcare, as well as further improvements in human life [11,12]. The originality of this work can be found in its proposal of a new parameter and a novel deep learning method that combines HOSs, MFCCs, and LPCCs in the /a/, /i/, and /u/ voice signals of healthy and pathological individuals. This paper intruduces an intelligent pathological voice detection system that supports an accurate and objective diagnosis based on deep learning and the parameters introduced. The experimental results emphasize the superiority of the proposed pathological voice detection system integrating machine learning methods and various parameters to monitor and diagnose a pathological voice for an effective and reliable system
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.