Abstract

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.

Highlights

  • The automatic detection of speech disabilities has attracted significant clinical and academic attention, with the hope of accurately diagnosing speech impairments before they are identified by well-trained experts and expensive equipment

  • Many researchers focus on acoustic analysis, parametric and nonparametric feature extraction, and the automatic detection of speech pathology using pattern recognition algorithms and statistical methods [1,2,3,4], pathological voice detection studies using deep learning techniques have been actively published recently

  • This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of pathological speech using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), as well as higher-order statistics (HOSs) parameters

Read more

Summary

Introduction

The automatic detection of speech disabilities has attracted significant clinical and academic attention, with the hope of accurately diagnosing speech impairments before they are identified by well-trained experts and expensive equipment. The main motivation for realizing this work is the use of artificial intelligence to diagnose various diseases This can lead to significant improvements in diagnosis and healthcare, as well as further improvements in human life [11,12]. The originality of this work can be found in its proposal of a new parameter and a novel deep learning method that combines HOSs, MFCCs, and LPCCs in the /a/, /i/, and /u/ voice signals of healthy and pathological individuals. This paper intruduces an intelligent pathological voice detection system that supports an accurate and objective diagnosis based on deep learning and the parameters introduced. The experimental results emphasize the superiority of the proposed pathological voice detection system integrating machine learning methods and various parameters to monitor and diagnose a pathological voice for an effective and reliable system

Related Work
Database
Feature Extraction
Deep Learning Methods
Experimental Results and Discussion
Conclusions
Objective
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call