Emotion Recognition from Large-Scale Video Clips with Cross-Attention and Hybrid Feature Weighting Neural Networks.

Siwei Zhou,Xuemei Wu,Qionghao Huang,Fan Jiang,Changqin Huang

doi:10.3390/ijerph20021400

Siwei Zhou, Xuemei Wu + Show 3 more

Open Access

https://doi.org/10.3390/ijerph20021400

Copy DOI

Abstract

The emotion of humans is an important indicator or reflection of their mental states, e.g., satisfaction or stress, and recognizing or detecting emotion from different media is essential to perform sequence analysis or for certain applications, e.g., mental health assessments, job stress level estimation, and tourist satisfaction assessments. Emotion recognition based on computer vision techniques, as an important method of detecting emotion from visual media (e.g., images or videos) of human behaviors with the use of plentiful emotional cues, has been extensively investigated because of its significant applications. However, most existing models neglect inter-feature interaction and use simple concatenation for feature fusion, failing to capture the crucial complementary gains between face and context information in video clips, which is significant in addressing the problems of emotion confusion and emotion misunderstanding. Accordingly, in this paper, to fully exploit the complementary information between face and context features, we present a novel cross-attention and hybrid feature weighting network to achieve accurate emotion recognition from large-scale video clips, and the proposed model consists of a dual-branch encoding (DBE) network, a hierarchical-attention encoding (HAE) network, and a deep fusion (DF) block. Specifically, the face and context encoding blocks in the DBE network generate the respective shallow features. After this, the HAE network uses the cross-attention (CA) block to investigate and capture the complementarity between facial expression features and their contexts via a cross-channel attention operation. The element recalibration (ER) block is introduced to revise the feature map of each channel by embedding global information. Moreover, the adaptive-attention (AA) block in the HAE network is developed to infer the optimal feature fusion weights and obtain the adaptive emotion features via a hybrid feature weighting operation. Finally, the DF block integrates these adaptive emotion features to predict an individual emotional state. Extensive experimental results of the CAER-S dataset demonstrate the effectiveness of our method, exhibiting its potential in the analysis of tourist reviews with video clips, estimation of job stress levels with visual emotional evidence, or assessments of mental healthiness with visual media.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International journal of environmental research and public health	Publication Date: Jan 12, 2023
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Emotion Recognition from Large-Scale Video Clips with Cross-Attention and Hybrid Feature Weighting Neural Networks.

Abstract

Talk to us

Similar Papers

More From: International journal of environmental research and public health

Lead the way for us

Similar Papers

Sleep Deprivation and Emotion Recognition
Carmen M Schroder
Sleep | VOL. 33
Carmen M SchroderCarmen M Schroder
01 Mar 2010
Sleep | VOL. 33

Recognition of Emotions in Video Clips: The Self-Assessment Manikin Validation
Dini Handayani ... Abdul Wahab
TELKOMNIKA (Telecommunication Computing Electronics and Control) | VOL. 13
Dini Handayani, et. al.Dini Handayani ... Abdul Wahab
01 Dec 2015
TELKOMNIKA (Telecommunication Computing Electronics and Control) | VOL. 13

Hybrid Attention Cascade Network for Facial Expression Recognition.
Xiaoliang Zhu ... Liang Zhao
Sensors | VOL. 21
Xiaoliang Zhu, et. al.Xiaoliang Zhu ... Liang Zhao
12 Mar 2021
Sensors | VOL. 21

Emotion in Robots Using Convolutional Neural Networks
Mehdi Ghayoumi ... Arvind K Bansal
-
Mehdi Ghayoumi, et. al.Mehdi Ghayoumi ... Arvind K Bansal
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Emotion Recognition from Large-Scale Video Clips with Cross-Attention and Hybrid Feature Weighting Neural Networks.

Abstract

Talk to us

Similar Papers

More From: International journal of environmental research and public health