Accuracy Of Speech Emotion Recognition Research Articles

Speech emotion recognition is a challenging and widely examined research topic in the field of speech processing. The accuracy of existing models in speech emotion recognition tasks is not high, and the generalization ability is not strong. Since the feature set and model design of effective speech directly affect the accuracy of speech emotion recognition, research on features and models is important. Because emotional expression is often correlated with the global features, local features, and model design of speech, it is often difficult to find a universal solution for effective speech emotion recognition. Based on this, the main research purpose of this paper is to generate general emotion features in speech signals from different angles, and use the ensemble learning model to perform emotion recognition tasks. It is divided into the following aspects: (1) Three expert roles of speech emotion recognition are designed. Expert 1 focuses on three-dimensional feature extraction of local signals; expert 2 focuses on extraction of comprehensive information in local data; and expert 3 emphasizes global features: acoustic feature descriptors (low-level descriptors (LLDs)), high-level statistics functionals (HSFs), and local features and their timing relationships. A single-/multiple-level deep learning model that meets expert characteristics is designed for each expert, including convolutional neural network (CNN), bi-directional long short-term memory (BLSTM), and gated recurrent unit (GRU). Convolutional recurrent neural network (CRNN), based on a combination of an attention mechanism, is used for internal training of experts. (2) By designing an ensemble learning model, each expert can play to its own advantages and evaluate speech emotions from different focuses. (3) Through experiments, the performance of various experts and ensemble learning models in emotion recognition is compared in the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus and the validity of the proposed model is verified.

Speech has been one of the major communication medium for years and will continue to do so until video communication becomes widely available and easily accessible. Although numerous technologies have been developed to improve the effectiveness of speech communication system, human interaction with machines and robots are still far from ideal. It is acknowledged that human can communicate effectively with each other through the telephony system. This situation motivates many researchers to study in depth the human communication system, with emphasis on its ability to express and infer emotion for effective social communication. Understanding the interlocutors’ emotion and recognizing the listeners’ perception is the key to boost communication effectiveness and interaction. Nonetheless, the perceived emotion is subjective and very much dependent on culture, environment and the pre-emotional state of the listener. Attempts have been made to understand the influence of culture in speech emotion and researchers have reported mixed findings that lead us to believe there are some common acoustical characteristics that enable similar emotion to be discriminated universally across culture. Yet there are unique speech attributes that facilitate exclusive emotion recognition of a particular culture. Understanding culture dependency is thus important to the performance of the speech emotion recognition system. In this paper three different speech emotion databases; namely: Berlin Emo-db, NTU_American and NTU_Asian dataset were selected to represent three different cultures of European, American and Asian respectively focusing on three basic emotions of anger, happiness and sadness with neutral acting as a reference. Different data arrangements with accordance to varying degree of culture dependency were designed for the experimental setup to provide better understanding of inter-cultural and intra-cultural effect in recognizing the speech emotion. Features were extracted using Mel Frequency Cepstral Co-efficient (MFCC) method and classified with neural network (Multi Layer Perceptron (MLP)) and fuzzy neural networks; namely: Adaptive Network Fuzzy Inference System (ANFIS) and Generic Self-Organizing Fuzzy Neural Network (GenSOFNN) representing precise and linguistic fuzzy rule conjuncts respectively. From the experimental results, it can be observed that culture influences the speech emotion recognition accuracy. 75% accuracy performance was recorded for generalized homogeneous intra-cultural experiments whereas the accuracy performance dropped to almost as low as chance probability (25% for 4 classes) for both homogeneous and heterogeneous mixed-cultural inter-culture experiments. The two-stage culture-sensitive speech emotion recognition approach was subsequently proposed to discriminate culture and speech emotion. Results of the analysis show potential of using the proposed technique to recognize culture-influenced speech emotion, which can be extended in many applications, for instance call center and intelligent vehicle. Such analysis may help us to better understand the culture dependency of speech emotion and as a result the accuracy performance of the speech emotion recognition system can be boosted.

Accuracy Of Speech Emotion Recognition Research Articles

Related Topics

Articles published on Accuracy Of Speech Emotion Recognition

Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

A Review on Emotion Detection and Classification using Speech

A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.

An Ensemble Model for Multi-Level Speech Emotion Recognition

Speech emotion classification using fractal dimension-based features

Segment Repetition Based on High Amplitude to Enhance a Speech Emotion Recognition

Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features

IMPROVED SPEAKER-INDEPENDENT EMOTION RECOGNITION FROM SPEECH USING TWO-STAGE FEATURE REDUCTION

Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System.

Cultural dependency analysis for understanding speech emotion

A novel hierarchical speech emotion recognition method based on improved DDAGSVM

Feature vector classification based speech emotion recognition for service robots

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Accuracy Of Speech Emotion Recognition Research Articles

Related Topics

Articles published on Accuracy Of Speech Emotion Recognition

Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

A Review on Emotion Detection and Classification using Speech

A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.

An Ensemble Model for Multi-Level Speech Emotion Recognition

Speech emotion classification using fractal dimension-based features

Segment Repetition Based on High Amplitude to Enhance a Speech Emotion Recognition

Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features

IMPROVED SPEAKER-INDEPENDENT EMOTION RECOGNITION FROM SPEECH USING TWO-STAGE FEATURE REDUCTION

Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System.

Cultural dependency analysis for understanding speech emotion

A novel hierarchical speech emotion recognition method based on improved DDAGSVM

Feature vector classification based speech emotion recognition for service robots