Speech Emotion Classification Research Articles

Speech Emotion Classification (SEC) relies heavily on the quality of feature extraction and selection from the speech signal. Improvement on this to enhance the classification of emotion had attracted significant attention from researchers. Many primitives and algorithmic solutions for efficient SEC with minimum cost have been proposed; however, the accuracy and performance of these methods have not yet attained a satisfactory point. In this work, we proposed a novel deep transfer learning approach with distinctive emotional rich feature selection techniques for speech emotion classification. We adopt mel-spectrogram extracted from speech signal as the input to our deep convolutional neural network for efficient feature extraction. We froze 19 layers of our pretrained convolutional neural network from re-training to increase efficiency and minimize computational cost. One flattened layer and two dense layers were used. A ReLu activation function was used at the last layer of our feature extraction segment. To prevent misclassification and reduce feature dimensionality, we employed the Neighborhood Component Analysis (NCA) feature selection algorithm for picking out the most relevant features before the actual classification of emotion. Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) classifiers were utilized at the topmost layer of our model. Two popular datasets for speech emotion classification tasks were used, which are: Berling Emotional Speech Database (EMO-DB), and Toronto English Speech Set (TESS), and a combination of EMO-DB with TESS was used in our experiment. We obtained a state-of-the-art result with an accuracy rate of 94.3%, 100% specificity on EMO-DB, and 97.2%, 99.80% on TESS datasets, respectively. The performance of our proposed method outperformed some recent work in SEC after assessment on the three datasets.

Background and objectiveWe are living in the pandemic age, and many educational institutions have shifted to a distance education system to ensure learning continuity while at the same time curtailing the spread of the Covid-19 virus. Automated speech emotion classification models can be used to measure the lecturer's performance during the lecture. Material and methodIn this work, we collected a new lecturer's speech dataset to detect three emotions: positive, neutral, and negative. The dataset is divided into segments with a length of five seconds per segment. Each segment has been utilized as an observation and contains 9541 observations. To automatically classify these emotions, a hand-modeled learning approach is presented. This approach has a comprehensive feature extraction method. In the feature extraction, a shoelace-based local feature generator is introduced, called Shoelace Pattern. The suggested feature extractor generates features at a low level. To further improve the feature generation capability of the Shoelace Pattern, tunable q wavelet transform (TQWT) is used to create sub-bands. Shoelace Pattern generates features from raw speech and sub-bands, and the proposed feature extraction method selects the most suitable feature vectors. The top four feature vectors are selected and merged to obtain the final feature vector. By deploying neighborhood component analysis (NCA), we chose the most informative 512 features, and these features are classified using a support vector machine (SVM) classifier using 10-fold cross-validation. ResultsThe proposed learning model based on the shoelace pattern (ShoePat23) attained 94.97% and 96.41% classification accuracies on the collected speech databases consecutively. ConclusionsThe findings demonstrate the success of the ShoePat23 on speech emotion recognition. Moreover, this model has been used in the distance education system to detect the performance of the lecturers.

Speech Emotion Classification Research Articles

Related Topics

Articles published on Speech Emotion Classification

CREMA-D: Improving Accuracy with BPSO-Based Feature Selection for Emotion Recognition Using Speech

Emotion recognition of human speech using deep learning method and MFCC features

Speech emotion recognition using Ramanujan Fourier Transform

Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning

Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation

A novel convolutional neural network with gated recurrent unit for automated speech emotion recognition and classification

Research on Speech Emotion Recognition Based on AA-CBGRU Network

Emotional speech analysis and classification using variational mode decomposition

Shoelace pattern-based speech emotion recognition of the lecturers in distance education: ShoePat23

Speech Emotion Classification Using Semi-Supervised LSTM

Audio Tagging Using CNN Based Audio Neural Networks for Massive Data Processing

Effect on speech emotion classification of a feature selection approach using a convolutional neural network.

Transfer learning based convolution neural net for authentication and classification of emotions from natural and stimulated speech signals

Influence of emotion distribution and classification on a call processing for an emergency call center

Robustness to noise for speech emotion classification using CNNs and attention mechanisms

Hilbert–Huang–Hurst‐based non‐linear acoustic feature vector for emotion classification with stochastic models and learning systems

Multimodal speech emotion recognition and classification using convolutional neural network techniques

Speech Emotion recognition feature Extraction and Classification

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Speech Emotion Classification Using Attention-Based LSTM

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Emotion Classification Research Articles

Related Topics

Articles published on Speech Emotion Classification

CREMA-D: Improving Accuracy with BPSO-Based Feature Selection for Emotion Recognition Using Speech

Emotion recognition of human speech using deep learning method and MFCC features

Speech emotion recognition using Ramanujan Fourier Transform

Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning

Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation

A novel convolutional neural network with gated recurrent unit for automated speech emotion recognition and classification

Research on Speech Emotion Recognition Based on AA-CBGRU Network

Emotional speech analysis and classification using variational mode decomposition

Shoelace pattern-based speech emotion recognition of the lecturers in distance education: ShoePat23

Speech Emotion Classification Using Semi-Supervised LSTM

Audio Tagging Using CNN Based Audio Neural Networks for Massive Data Processing

Effect on speech emotion classification of a feature selection approach using a convolutional neural network.

Transfer learning based convolution neural net for authentication and classification of emotions from natural and stimulated speech signals

Influence of emotion distribution and classification on a call processing for an emergency call center

Robustness to noise for speech emotion classification using CNNs and attention mechanisms

Hilbert–Huang–Hurst‐based non‐linear acoustic feature vector for emotion classification with stochastic models and learning systems

Multimodal speech emotion recognition and classification using convolutional neural network techniques

Speech Emotion recognition feature Extraction and Classification

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Speech Emotion Classification Using Attention-Based LSTM