Multimodal Emotion Recognition Model Based on a Deep Neural Network with Multiobjective Optimization

Mingyong Li,Xue Qiu,Qiqi Li,Wenhui Yang,Lirong Tang,Yan Ma,Shuang Peng,Balakrishnan Nagaraj

doi:10.1155/2021/6971100

Mingyong Li, Xue Qiu + Show 6 more

Open Access

https://doi.org/10.1155/2021/6971100

Copy DOI

Abstract

With the rapid development of deep learning and wireless communication technology, emotion recognition has received more and more attention from researchers. Computers can only be truly intelligent when they have human emotions, and emotion recognition is its primary consideration. This paper proposes a multimodal emotion recognition model based on a multiobjective optimization algorithm. The model combines voice information and facial information and can optimize the accuracy and uniformity of recognition at the same time. The speech modal is based on an improved deep convolutional neural network (DCNN); the video image modal is based on an improved deep separation convolution network (DSCNN). After single mode recognition, a multiobjective optimization algorithm is used to fuse the two modalities at the decision level. The experimental results show that the proposed model has a large improvement in each evaluation index, and the accuracy of emotion recognition is 2.88% higher than that of the ISMS_ALA model. The results show that the multiobjective optimization algorithm can effectively improve the performance of the multimodal emotion recognition model.

Highlights

The concept of “emotional computing” was first proposed by professor Picard of the Massachusetts Institute of Technology in the book Affective Computing published in 1997
The multiobjective optimization algorithm is used to optimize the accuracy of model recognition and the uniformity of emotion recognition at the same time
This paper presents a multimodal emotion recognition model based on the multiobjective optimization algorithm

Summary

Introduction

The concept of “emotional computing” was first proposed by professor Picard of the Massachusetts Institute of Technology in the book Affective Computing published in 1997. She defined “affective computing” as the calculation of factors related to human emotion, triggered by human emotion or able to affect emotion [1]. The external expression of human emotion mainly includes voice, facial expression, posture, and so on. Human speech contains linguistic information and nonlinguistic information such as people’s emotional state. Human speech can express emotion because it contains parameters that can reflect the characteristics of emotion. Facial expression is an important external form of emotion, which contains certain emotional information. The research of facial expression recognition can effectively promote the development of emotion recognition research and the research of automatic understanding of computer images [4,5,6]

Methods

Results

Conclusion