A Visual–Audio-Based Emotion Recognition System Integrating Dimensional Analysis

Jiajia Tian,Yingying She

doi:10.1109/tcss.2022.3200060

Abstract

Dimensional emotion recognition research is an important branch of affective computing, which uses continuous values to represent complex human emotions. In this study, we propose a visual–audio emotion recognition system that integrated emotional dimensions. For the visual part, the corresponding relationship between emotion category and emotion dimension interval is established based on rules, and the respective classifiers are trained and fused using the machine learning methods. For the audio part, some emotion-related features are extracted, and a 128-D global feature is extracted through a deep convolutional neural network (DCNN). We use a combination of Bayesian and machine learning to integrate the information of visual–audio modalities. We have tested the proposed system and its single modalities on the standard databases CK <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula> and eNTERFACE ’05, and the experimental results and comparison showed the efficiency of the proposed system. Furthermore, our proposed system uses emotion category label and dimension values simultaneously to represent emotion, providing strong interpretability and expansibility for emotion recognition, which goes beyond the methods only come with either classification or dimension.

Full Text