Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition.

Xiaodong Liu,Miao Wang,Songyang Li

doi:10.1155/2021/5585041

Xiaodong Liu, Miao Wang + Show 1 more

Open Access

https://doi.org/10.1155/2021/5585041

Copy DOI

Abstract

The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention-based multimodal fusion network for video emotion recognition, which consists of a multimodal feature extraction module and a multimodal feature fusion module. The multimodal feature extraction module has three subnetworks used to extract features of facial, scene, and global images. Each subnetwork consists of two branches, where the first branch extracts the features of different modes, and the other branch generates the emotion score for each image. Features and emotion scores of all images in a modal are aggregated to generate the emotion feature of the modal. The other module takes multimodal features as input and generates the emotion score for each modal. Finally, features and emotion scores of multiple modes are aggregated, and the final emotion representation of the video will be produced. Experimental results show that our proposed method is effective on the emotion recognition dataset.

Highlights

Emotion recognition is an important content of a comprehensive understanding of video scenes
With the success of deep convolution neural networks (CNNs) in the field of image classification and object detection, researchers attempt to extract face features based on deep neural networks to further improve the performance of emotion recognition [3, 4]
We first build a dataset for human emotion recognition in video, named multimodal human emotion dataset (MHED)

Summary

Introduction

Emotion recognition is an important content of a comprehensive understanding of video scenes. With the success of deep convolution neural networks (CNNs) in the field of image classification and object detection, researchers attempt to extract face features based on deep neural networks to further improve the performance of emotion recognition [3, 4] It cannot model the temporal evolution of emotion expression. HAMF takes the image sequence of face, scene, and context as input and can learn a discrimination video emotion representation that can make full use of the differences of different modes and images.

Related Work

Hierarchical Attention-Based Multimodal Fusion Network

Experiments

Methods

Method

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Intelligence and Neuroscience	Publication Date: Jan 1, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience

Lead the way for us

Similar Papers

Recurrent Neural Networks for Emotion Recognition in Video
Samira Ebrahimi Kahou ... Christopher Pal
-
Samira Ebrahimi Kahou, et. al.Samira Ebrahimi Kahou ... Christopher Pal
09 Nov 2015
09 Nov 2015

Spatial-frequency-temporal convolutional recurrent network for olfactory-enhanced EEG emotion recognition
Mengxia Xing ... Zhao Lv
Journal of Neuroscience Methods | VOL. 376
Mengxia Xing, et. al.Mengxia Xing ... Zhao Lv
16 May 2022
Journal of Neuroscience Methods | VOL. 376

Exploring the Contextual Factors Affecting Multimodal Emotion Recognition in Videos
Prasanta Bhattacharya ... Raj Kumar Gupta
IEEE Transactions on Affective Computing | VOL. 14
Prasanta Bhattacharya, et. al.Prasanta Bhattacharya ... Raj Kumar Gupta
09 Apr 2021
IEEE Transactions on Affective Computing | VOL. 14

A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals.
Baole Fu ... Ming Fu
Frontiers in neuroscience | VOL. 17
Baole Fu, et. al.Baole Fu ... Ming Fu
03 Aug 2023
Frontiers in neuroscience | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience