MFCC Features Research Articles

The objective of automated speech emotion recognition (SER) is to recognize each emotion of the speech signal uniquely and efficiently using machines like computers, mobile devices, etc. The popularity of SER has been widely acknowledged among researchers due to its extensive applicability in practical contexts. The use of SER has shown to be advantageous in several domains, including medical treatment, enhancement of security systems, surveillance operations, online marketing strategies, online educational platforms, online search engines, personal communication, customer relationship management, reinforcement of machine and human connection, and numerous other areas. Numerous authors have used various techniques, including the combination of multiple features (acoustic, non-acoustic, or both) and classifiers (machine learning, deep learning, or both), in order to enhance the efficacy of emotion categorization. In the present work, our objective is to enhance the performance of emotion classification (PECL) by using a combined approach of the variational mode decomposition (VMD) and Hilbert transform (HT) techniques. Using the VMD method, we decomposed the speech signal frame into many sub-signals, or intrinsic mode functions (IMFs). Then HT is applied to each VMD-based IMF signal to find the mode instantaneous amplitude (MIA) and mode instantaneous frequency (MIF) signal vectors. We extracted proposed features such as HT-based approximate entropy (HTAE), HT-based permutation entropy (HTPE), HT-based increment entropy (HTIE), and HT-based sample entropy (HTSE) using each MIA and MIF signal vector. The combination of the proposed HT-based features is called HT-based entropy (HTE) features. Then, we accessed the PECL using the HTE and MFCC features alone and in conjunction with a deep neural network (DNN) classifier. The experiment's results showed that the combinations of the proposed feature (MFCC + HTE) using a DNN classifier outperformed the individual features and obtained a SER accuracy of 86.92% for the EMOVO dataset and 91.63% for the EMO-DB dataset.

Read full abstract

The vocalization of infants, commonly known as baby crying, represents one of the primary means by which infants effectively communicate their needs and emotional states to adults. While the act of crying can yield crucial insights into the well-being and comfort of a baby, there exists a dearth of research specifically investigating the influence of the audio range within a baby cry on research outcomes. The core problem of research is the lack of research on the influence of audio range on baby cry classification on machine learning. The purpose of this study is to ascertain the impact of the duration of an infant’s cry on the outcomes of machine learning classification and to gain knowledge regarding the accuracy of results F1 score obtained through the utilization of the machine learning method. The contribution is to enrich an understanding of the application of classification and feature selection in audio datasets, particulary in the context of baby cry audio. The utilized dataset, known as donate-a-cry-corpus, encompasses five distinct data classes and possesses a duration of seven seconds. The employed methodology consists of the spectrogram technique, cross-validation for data partitioning, MFCC feature extraction with 10, 20, and 30 coefficients, as well as machine learning models including Support Vector Machine, Random Forest, and Naïve Bayes. The findings of this study reveal that the Random Forest model achieved an accuracy of 0.844 and an F1 score of 0.773 when 10 MFCC coefficients were utilized and the optimal audio range was set at six seconds. Furthermore, the Support Vector Machine model with an RBF kernel yielded an accuracy of 0.836 and an F1 score of 0.761, while the Naïve Bayes model achieved an accuracy 0.538 and F1 score of 0.539. Notably, no discernible differences were observed when evaluating the Support Vector Machine and Naïve Bayes methods across the 1-7 second time trial. The implication of this research is to establish a foundation for the advancement of premature illness identification techniques grounded in the vocalizations of infants, thereby facilitating swifter diagnostic processes for pediatric practitioners.

Read full abstract

MFCC Features Research Articles

Related Topics

Articles published on MFCC Features

Post-Stroke Dysarthria Voice Recognition based on Fusion Feature MSA and 1D

Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique

Optimizing avian species recognition with MFCC features and deep learning models

Automatic Age and Gender Recognition Using Ensemble Learning

Emotion Recognition in Lhasa Tibetan Speech based on Bi-LSTM Graph Convolutional Networks

Speaker Identification Using MFCC Feature Extraction ANN Classification Technique

Text-to-Speech Synthesis for Hindi Language Using MFCC and LPC Feature Extraction Techniques

Enhancing Qur'anic Recitation Experience with CNN and MFCC Features for Emotion Identification

Emotional voice conversion using DBiLSTM-NN with MFCC and LogF0 features

ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features

Speech emotion recognition using a combination of variational mode decomposition and Hilbert transform

Enhancing robotic manipulator fault detection with advanced machine learning techniques

Human Scream Detection and Analysis to Control Crime Rate using Machine Learning

A multi-level power grid enhanced identity authentication data management platform based on filtering algorithms

A deep CNN-based acoustic model for the identification of lung diseases utilizing extracted MFCC features from respiratory sounds

A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features

A novel Approach for Audio-based Video Analysis via MFCC Features

Enhancing Audio Classification Through MFCC Feature Extraction and Data Augmentation with CNN and RNN Models

Enhancing Audio Classification Through MFCC Feature Extraction and Data Augmentation with CNN and RNN Models

Mining the Performance Characteristics of Yao Nationality Musical Instruments under Multivariate Statistical Analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

MFCC Features Research Articles

Related Topics

Articles published on MFCC Features

Post-Stroke Dysarthria Voice Recognition based on Fusion Feature MSA and 1D

Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique

Optimizing avian species recognition with MFCC features and deep learning models

Automatic Age and Gender Recognition Using Ensemble Learning

Emotion Recognition in Lhasa Tibetan Speech based on Bi-LSTM Graph Convolutional Networks

Speaker Identification Using MFCC Feature Extraction ANN Classification Technique

Text-to-Speech Synthesis for Hindi Language Using MFCC and LPC Feature Extraction Techniques

Enhancing Qur'anic Recitation Experience with CNN and MFCC Features for Emotion Identification

Emotional voice conversion using DBiLSTM-NN with MFCC and LogF0 features

ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features

Speech emotion recognition using a combination of variational mode decomposition and Hilbert transform

Enhancing robotic manipulator fault detection with advanced machine learning techniques

Human Scream Detection and Analysis to Control Crime Rate using Machine Learning

A multi-level power grid enhanced identity authentication data management platform based on filtering algorithms

A deep CNN-based acoustic model for the identification of lung diseases utilizing extracted MFCC features from respiratory sounds

A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features

A novel Approach for Audio-based Video Analysis via MFCC Features

Enhancing Audio Classification Through MFCC Feature Extraction and Data Augmentation with CNN and RNN Models

Enhancing Audio Classification Through MFCC Feature Extraction and Data Augmentation with CNN and RNN Models

Mining the Performance Characteristics of Yao Nationality Musical Instruments under Multivariate Statistical Analysis