Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition

Dung Nguyen,Kien Nguyen,Sridha Sridharan,David Dean,Clinton Fookes

doi:10.1016/j.cviu.2018.06.005

Abstract

Multimodal emotion recognition has attracted great interest recently and numerous methodologies have been successfully investigated. However, the task requires the effective fusion multimodal representations in audio and video domains, and existing approaches still perform poorly on such a challenging task. This paper proposes a novel framework for recognizing emotion from multiple sources including facial expression, pose, body movements, and voice. In this framework, we first introduce new deep spatio-temporal features by cascading 3-dimensional convolution neural networks (C3Ds) and deep belief networks (DBNs) to effectively model spatial and temporal information presented in video and audio for emotion recognition. We subsequently propose a new feature-level fusion approach based on a bilinear pooling theory to combine the visual and audio feature vectors. The proposed fusion strategy allows all elements of the component vectors to interact with each other in an effective way, resulting in expressively capturing the complex and intrinsic associations between the component modalities. Extensive experiments conducted on the eNTERFACE and FABO multimodal emotion databases demonstrate that our proposed system leads to improved multimodal emotion recognition performance and significantly outperforms recent state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Journal: Computer Vision and Image Understanding	Publication Date: Jul 26, 2018
Citations: 80

Similar Papers

A multimodal fusion emotion recognition method based on multitask learning and attention mechanism
Jinbao Xie ... Yury I Varatnitski
Neurocomputing | VOL. 556
Jinbao Xie, et. al.Jinbao Xie ... Yury I Varatnitski
04 Aug 2023
Neurocomputing | VOL. 556

An Improved Multimodal Dimension Emotion Recognition Based on Different Fusion Methods
Haiyang Su ... Zheng Lian
-
Haiyang Su, et. al.Haiyang Su ... Zheng Lian
06 Dec 2020
06 Dec 2020

Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing
Dong Zhang ... Xincheng Ju
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Dong Zhang, et. al.Dong Zhang ... Xincheng Ju
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Multimodal emotion recognition using deep learning techniques
Tien Dung Nguyen
-
Tien Dung NguyenTien Dung Nguyen
24 Mar 2020
24 Mar 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding