An Ensemble Model for Multi-Level Speech Emotion Recognition

Chunjun Zheng,Ning Jia,Chunli Wang

doi:10.3390/app10010205

Chunjun Zheng, Ning Jia + Show 1 more

Open Access

PDF Available

https://doi.org/10.3390/app10010205

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Speech emotion recognition is a challenging and widely examined research topic in the field of speech processing. The accuracy of existing models in speech emotion recognition tasks is not high, and the generalization ability is not strong. Since the feature set and model design of effective speech directly affect the accuracy of speech emotion recognition, research on features and models is important. Because emotional expression is often correlated with the global features, local features, and model design of speech, it is often difficult to find a universal solution for effective speech emotion recognition. Based on this, the main research purpose of this paper is to generate general emotion features in speech signals from different angles, and use the ensemble learning model to perform emotion recognition tasks. It is divided into the following aspects: (1) Three expert roles of speech emotion recognition are designed. Expert 1 focuses on three-dimensional feature extraction of local signals; expert 2 focuses on extraction of comprehensive information in local data; and expert 3 emphasizes global features: acoustic feature descriptors (low-level descriptors (LLDs)), high-level statistics functionals (HSFs), and local features and their timing relationships. A single-/multiple-level deep learning model that meets expert characteristics is designed for each expert, including convolutional neural network (CNN), bi-directional long short-term memory (BLSTM), and gated recurrent unit (GRU). Convolutional recurrent neural network (CRNN), based on a combination of an attention mechanism, is used for internal training of experts. (2) By designing an ensemble learning model, each expert can play to its own advantages and evaluate speech emotions from different focuses. (3) Through experiments, the performance of various experts and ensemble learning models in emotion recognition is compared in the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus and the validity of the proposed model is verified.

Highlights

As the most convenient and natural medium for human communication, speech is the most basic and direct way we have to transmit information to each other
Focusing on the above problems, this paper carries out related research on the design of speech emotion features with a multi-level deep learning model and constructed ensemble learning schemes for the comprehensive consideration of multi experts’ suggestions [3]
In [16], a deep retinal convolutional neural network is proposed for Speech Emotion Recognition (SER), with advanced features learned from a spectrogram, which is superior to previous studies on the accuracy of emotion recognition

Summary

Introduction

As the most convenient and natural medium for human communication, speech is the most basic and direct way we have to transmit information to each other. The decision-making aspect of the speech emotion recognition model often plays a decisive role At this time, if the state of the expert is unstable, it directly affects the final emotional judgment. Based on the above research status, some scholars are working to overcome these problems to improve the recognition rate of speech emotions, few experts have fully explored the correlation between global and local features in different roles, features, and models. Focusing on the above problems, this paper carries out related research on the design of speech emotion features with a multi-level deep learning model and constructed ensemble learning schemes for the comprehensive consideration of multi experts’ suggestions [3]. The fifth part is the summary of the work of this paper and the prospects for future work

Related Work

Voiceprint Recognition Technology

Multi-Level Recognition Technology

Ensemble Learning Technology

Design Route for the Overall Model

Expert 1

Analysis and Preprocessing of Speech Signals

Design of Double-Channel Model Based on CNN

Expert 2

Local Comprehensive Feature Extraction

Design of GRU Model Combined with the Attention Mechanism

Expert 3

Feature Selection and Integration

Design of Feature Extraction Model Based on CRNN

Design of Multilevel Model Based on HSFs and CRNN

Design

Design of Ensemble Learning Model

Experimental Preparation

Independent Experiment for Each Expert

Experiment with Ensemble Learning Model

Findings

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 26, 2019
Citations: 35	License type: CC BY 4.0

R Discovery Prime

An Ensemble Model for Multi-Level Speech Emotion Recognition

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Research on Speech Emotional Feature Extraction Based on Multidimensional Feature Fusion
Chunjun Zheng ... Wei Sun
-
Chunjun Zheng, et. al.Chunjun Zheng ... Wei Sun
01 Jan 2019
01 Jan 2019

Comparative Study of Speech Emotion Recognition Based On CNN and CRNN
Nan Jiang ... Dongmei Shao
-
Nan Jiang, et. al.Nan Jiang ... Dongmei Shao
02 Dec 2020
02 Dec 2020

Speech emotion recognition using recurrent neural networks with directional self-attention
Dongdong Li ... Zhe Wang
Expert Systems with Applications | VOL. 173
Dongdong Li, et. al.Dongdong Li ... Zhe Wang
12 Feb 2021
Expert Systems with Applications | VOL. 173

A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.
Mustaqeem ... Soonil Kwon
Sensors | VOL. 20
Mustaqeem, et. al. Mustaqeem ... Soonil Kwon
28 Dec 2019
Sensors | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

An Ensemble Model for Multi-Level Speech Emotion Recognition

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences