Voice Pathology Detection System Research Articles

To distinguish pathological voices from healthy voices, automatic voice pathology detection systems can be built using machine learning (ML) and deep learning (DL) techniques. To fully exploit such systems, large quantities of training data are typically required. The amount of training data is, however, small in the area of pathological voice, and therefore data augmentation (DA) becomes a potential technology to artificially increase the quantity of training data. This study presents a systematic comparison between various DA methods in the detection of pathological voice, including three time domain methods (noise addition, pitch shifting and time stretching), one time–frequency domain method (SpecAugment), and two vocoder-based methods (harmonic-to-noise ratio (HNR) modification and glottal pulse length modification). Detection systems were built using four popular spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram). As classifiers, two widely used ML models (support vector machine (SVM) and random forest (RF)) and two DL models (long short-term memory (LSTM) network and convolutional neural network (CNN) with 1-dimensional (1-D) and 2-dimensional (2-D) architectures) were used. These systems were trained using a small number of training samples from two popular databases of pathological voice (HUPA and SVD) to find the best feature/classifier combination for each database. As a result, one ML-based detection system (mel-spectrogram/SVM for HUPA and SVD) and two DL-based detection systems (dynamic MFCCs/2-D CNN for HUPA and mel-spectrogram/2-D CNN for SVD) were selected for the comparison of the DA methods. The results show that by using DA in the system training, detection accuracy increased compared to the baseline systems that were trained without using DA. This improvement in accuracy was, however, clearly larger for the 2D-CNN system than for the SVM system. Furthermore, all six DA methods improved accuracy of the 2-D CNN system compared to the baseline system for both databases. The highest improvements were achieved using the time–frequency domain SpecAugment DA method, which improved accuracy by 1.5% and 3.8% (absolute) for the HUPA and SVD database, respectively.

Read full abstract

Diseases of internal organs other than the vocal folds can also affect a person's voice. As a result, voice problems are on the rise, even though they are frequently overlooked. According to a recent study, voice pathology detection systems can successfully help the assessment of voice abnormalities and enable the early diagnosis of voice pathology. For instance, in the early identification and diagnosis of voice problems, the automatic system for distinguishing healthy and diseased voices has gotten much attention. As a result, artificial intelligence-assisted voice analysis brings up new possibilities in healthcare. The work was aimed at assessing the utility of several automatic speech signal analysis methods for diagnosing voice disorders and suggesting a strategy for classifying healthy and diseased voices. The proposed framework integrates the efficacy of three voice characteristics: chroma, mel spectrogram, and mel frequency cepstral coefficient (MFCC). We also designed a deep neural network (DNN) capable of learning from the retrieved data and producing a highly accurate voice-based disease prediction model. The study describes a series of studies using the Saarbruecken Voice Database (SVD) to detect abnormal voices. The model was developed and tested using the vowels /a/, /i/, and /u/ pronounced in high, low, and average pitches. We also maintained the “continuous sentence” audio files collected from SVD to select how well the developed model generalizes to completely new data. The highest accuracy achieved was 77.49%, superior to prior attempts in the same domain. Additionally, the model attains an accuracy of 88.01% by integrating speaker gender information. The designed model trained on selected diseases can also obtain a maximum accuracy of 96.77% (cordectomy × healthy). As a result, the suggested framework is the best fit for the healthcare industry.

Read full abstract

Voice Pathology Detection System Research Articles

Related Topics

Articles published on Voice Pathology Detection System

A Robust Voice Pathology Detection System Based on the Combined BiLSTM–CNN Architecture

A comparison of data augmentation methods in voice pathology detection

MMHFNet: Multi-modal and multi-layer hybrid fusion network for voice pathology detection

An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection

Voice Pathology Detection Using a Two-Level Classifier Based on Combined CNN–RNN Architecture

An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.

Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion

Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Convergence of Artificial Intelligence and Internet of Things in Smart Healthcare: A Case Study of Voice Pathology Detection

Kullback–Leibler divergence and sample skewness for pathological voice quality assessment

Implementation of voice pathology detection system using feature selection

A Solution to the Security Authentication Problem in Smart Houses Based on Speech

Towards robust voice pathology detection

Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework

Smart Health Solution Integrating IoT and Cloud: A Case Study of Voice Pathology Monitoring

Automatic voice pathology detection and classification using vocal tract area irregularity

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Voice Pathology Detection System Research Articles

Related Topics

Articles published on Voice Pathology Detection System

A Robust Voice Pathology Detection System Based on the Combined BiLSTM–CNN Architecture

A comparison of data augmentation methods in voice pathology detection

MMHFNet: Multi-modal and multi-layer hybrid fusion network for voice pathology detection

An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection

Voice Pathology Detection Using a Two-Level Classifier Based on Combined CNN–RNN Architecture

An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.

Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion

Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals

Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method

Convergence of Artificial Intelligence and Internet of Things in Smart Healthcare: A Case Study of Voice Pathology Detection

Kullback–Leibler divergence and sample skewness for pathological voice quality assessment

Implementation of voice pathology detection system using feature selection

A Solution to the Security Authentication Problem in Smart Houses Based on Speech

Towards robust voice pathology detection

Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework

Smart Health Solution Integrating IoT and Cloud: A Case Study of Voice Pathology Monitoring

Automatic voice pathology detection and classification using vocal tract area irregularity