Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Ji-Yeoun Lee

doi:10.3390/app11157149

Abstract

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.

Highlights

The automatic detection of speech disabilities has attracted significant clinical and academic attention, with the hope of accurately diagnosing speech impairments before they are identified by well-trained experts and expensive equipment
Many researchers focus on acoustic analysis, parametric and nonparametric feature extraction, and the automatic detection of speech pathology using pattern recognition algorithms and statistical methods [1,2,3,4], pathological voice detection studies using deep learning techniques have been actively published recently
This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of pathological speech using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), as well as higher-order statistics (HOSs) parameters

Summary

Introduction

The automatic detection of speech disabilities has attracted significant clinical and academic attention, with the hope of accurately diagnosing speech impairments before they are identified by well-trained experts and expensive equipment. The main motivation for realizing this work is the use of artificial intelligence to diagnose various diseases This can lead to significant improvements in diagnosis and healthcare, as well as further improvements in human life [11,12]. The originality of this work can be found in its proposal of a new parameter and a novel deep learning method that combines HOSs, MFCCs, and LPCCs in the /a/, /i/, and /u/ voice signals of healthy and pathological individuals. This paper intruduces an intelligent pathological voice detection system that supports an accurate and objective diagnosis based on deep learning and the parameters introduced. The experimental results emphasize the superiority of the proposed pathological voice detection system integrating machine learning methods and various parameters to monitor and diagnose a pathological voice for an effective and reliable system

Related Work

Database

Feature Extraction

Deep Learning Methods

Experimental Results and Discussion

Conclusions

Objective

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Aug 2, 2021
Citations: 23	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Classification between Elderly Voices and Young Voices Using an Efficient Combination of Deep Learning Classifiers and Various Parameters
Ji-Yeoun Lee
Applied Sciences | VOL. 11
Ji-Yeoun LeeJi-Yeoun Lee
21 Oct 2021
Applied Sciences | VOL. 11

An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection
Ji-Na Lee ... Ji-Yeoun Lee
Applied Sciences | VOL. 13
Ji-Na Lee, et. al.Ji-Na Lee ... Ji-Yeoun Lee
10 Mar 2023
Applied Sciences | VOL. 13

Voice pathology identification system using a deep learning approach based on unique feature selection sets
Nuha Qais Abdulmajeed ... Belal Al‐Khateeb
Expert Systems | VOL. -
Nuha Qais Abdulmajeed, et. al.Nuha Qais Abdulmajeed ... Belal Al‐Khateeb
03 May 2023
Expert Systems | VOL. -

Deep Learning Based Emotion Classification Using Mel Frequency Magnitude Coefficient
Siba Prasad Mishra ... Suman Deb
-
Siba Prasad Mishra, et. al.Siba Prasad Mishra ... Suman Deb
04 Mar 2023
04 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences