Mel Frequency Cepstral Coefficients Features Research Articles

This study proposes a new deep learning-based method that demonstrates high performance in detecting Covid-19 disease from cough, breath, and voice signals. This impressive method, named CovidCoughNet, consists of a deep feature extraction network (InceptionFireNet) and a prediction network (DeepConvNet). The InceptionFireNet architecture, based on Inception and Fire modules, was designed to extract important feature maps. The DeepConvNet architecture, which is made up of convolutional neural network blocks, was developed to predict the feature vectors obtained from the InceptionFireNet architecture. The COUGHVID dataset containing cough data and the Coswara dataset containing cough, breath, and voice signals were used as the data sets. The pitch-shifting technique was used to data augmentation the signal data, which significantly contributed to improving performance. Additionally, Chroma features (CF), Root mean square energy (RMSE), Spectral centroid (SC), Spectral bandwidth (SB), Spectral rolloff (SR), Zero crossing rate (ZCR), and Mel frequency cepstral coefficients (MFCC) feature extraction techniques were used to extract important features from voice signals. Experimental studies have shown that using the pitch-shifting technique improved performance by around 3% compared to raw signals. When the proposed model was used with the COUGHVID dataset (Healthy, Covid-19, and Symptomatic), a high performance of 99.19% accuracy, 0.99 precision, 0.98 recall, 0.98 F1-Score, 97.77% specificity, and 98.44% AUC was achieved. Similarly, when the voice data in the Coswara dataset was used, higher performance was achieved compared to the cough and breath studies, with 99.63% accuracy, 100% precision, 0.99 recall, 0.99 F1-Score, 99.24% specificity, and 99.24% AUC. Moreover, when compared with current studies in the literature, the proposed model was observed to exhibit highly successful performance. The codes and details of the experimental studies can be accessed from the relevant Github page: (https://github.com/GaffariCelik/CovidCoughNet).

Read full abstract

Developing an automatic speaker verification (ASV) system for children is extremely challenging due to the unavailability of children’s speech corpora. The challenges are further exacerbated in the case of short utterances. Voice-based biometric systems require adequate amount of speech data for enrollment and verification; otherwise the performance considerably degrades. In this paper, we have focussed on data paucity and preserving the higher-frequency contents in order to enhance the performance of a short-utterance based children’s speaker verification system. To deal with data scarcity, several out-of-domain data augmentation techniques have been utilized. Since the out-of-domain data used is from adult speakers which are acoustically very different from children’s speech, we have resorted to techniques like prosody modification, formant modification and voice-conversion in order to render it acoustically similar to children’s speech prior to augmentation. This helps in not only increasing the amount of training data but also in effectively capturing the missing target attributes. A relative improvement of 33.6% in equal error rate (EER) with respect to the baseline system trained solely on child data-set is achieved when the proposed data augmentation technique is applied. Further to that, for the preservation of the higher-frequency contents, we have resorted to concatenation of the classical Mel-frequency cepstral coefficients (MFCC) features with the linear-frequency cepstral coefficient (LFCC) or with the inverse-Mel-frequency cepstral coefficient (IMFCC) features. The use of Mel-filter-bank leads to poor resolution of higher-frequency components. On the other hand, linear- or inverse-Mel-filter-banks yield better resolution of higher-frequency components. Moreover, MFCC and IMFCC features exhibit low canonical correlation. Consequently, the frame-level concatenation of MFCC and LFCC or IMFCC features leads to better resolution of both lower- as well as higher-frequency components. Therefore, the EER considerably reduces when either LFCC features or IMFCC features are concatenated with MFCC features. The EER for the full test set shows a relative reduction of 10.56% (with respect to the EER for the MFCC features) when IMFCC features are concatenated with the MFCC features. This novel approach of incorporating data augmentation followed by frame-level feature concatenation helps in achieving an overall reduction of 40.6% in EER.

Read full abstract

Mel Frequency Cepstral Coefficients Features Research Articles

Related Topics

Articles published on Mel Frequency Cepstral Coefficients Features

Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.

CovidCoughNet: A new method based on convolutional neural networks and deep feature extraction using pitch-shifting data augmentation for covid-19 detection from cough, breath, and voice signals

Automatic speaker verification system for dysarthric speakers using prosodic features and out-of-domain data augmentation

Recognizing Command Words using Deep Recurrent Neural Network for Both Acoustic and Throat Speech

SWMAT: Mel-frequency cepstral coefficients-based memory fingerprinting for IoT devices

Effective preservation of higher-frequency contents in the context of short utterance based children’s speaker verification system

Time-frequency analysis of speech signal using Chirplet transform for automatic diagnosis of Parkinson's disease.

Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

Tuning Dari Speech Classification Employing Deep Neural Networks

Interfacial debonding detection of steel beams reinforced by CFRP plates based on percussion method

Kiñit classification in Ethiopian chants, Azmaris and modern music: A new dataset and CNN benchmark.

Wavelet-based Parametric Feature Subset Selection for Speaker and Accent Recognition using Genetic Algorithm

A Perspective Study on Speech Recognition

Tuning Dari Speech Classification Employing Deep Neural Networks

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients

Underwater Acoustic Target Recognition Based on Data Augmentation and Residual CNN

Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children.

Comparative Analysis of LPC and MFCC for Male Speaker Recognition in Text-Independent Context

SPOKEN-DIGIT CLASSIFICATION USING ARTIFICIAL NEURAL NETWORK

The design of intelligent fuzzy cognitive system of music emotion by product supply chain management

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Mel Frequency Cepstral Coefficients Features Research Articles

Related Topics

Articles published on Mel Frequency Cepstral Coefficients Features

Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.

CovidCoughNet: A new method based on convolutional neural networks and deep feature extraction using pitch-shifting data augmentation for covid-19 detection from cough, breath, and voice signals

Automatic speaker verification system for dysarthric speakers using prosodic features and out-of-domain data augmentation

Recognizing Command Words using Deep Recurrent Neural Network for Both Acoustic and Throat Speech

SWMAT: Mel-frequency cepstral coefficients-based memory fingerprinting for IoT devices

Effective preservation of higher-frequency contents in the context of short utterance based children’s speaker verification system

Time-frequency analysis of speech signal using Chirplet transform for automatic diagnosis of Parkinson's disease.

Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

Tuning Dari Speech Classification Employing Deep Neural Networks

Interfacial debonding detection of steel beams reinforced by CFRP plates based on percussion method

Kiñit classification in Ethiopian chants, Azmaris and modern music: A new dataset and CNN benchmark.

Wavelet-based Parametric Feature Subset Selection for Speaker and Accent Recognition using Genetic Algorithm

A Perspective Study on Speech Recognition

Tuning Dari Speech Classification Employing Deep Neural Networks

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients

Underwater Acoustic Target Recognition Based on Data Augmentation and Residual CNN

Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children.

Comparative Analysis of LPC and MFCC for Male Speaker Recognition in Text-Independent Context

SPOKEN-DIGIT CLASSIFICATION USING ARTIFICIAL NEURAL NETWORK

The design of intelligent fuzzy cognitive system of music emotion by product supply chain management