Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals

Dong Yoon Choi,Deok-Hwan Kim,Byung Cheol Song

doi:10.1109/access.2020.3036877

Abstract

Emotion recognition is a very important technique for ultimate interactions between human beings and artificial intelligence systems. For effective emotion recognition in a continuous-time domain, this article presents a multimodal fusion network which integrates video modality and electroencephalogram (EEG) modality networks. To calculate the attention weights of facial video features and the corresponding EEG features in fusion, a multimodal attention network, that is utilizing bilinear pooling based on low-rank decomposition, is proposed. Finally, continuous domain valence values are computed by using two modality network outputs and attention weights. Experimental results show that the proposed fusion network provides an improved performance of about 6.9% over the video modality network for the MAHNOB human computer interface (MAHNOB-HCI) dataset. Also, we achieved the performance improvement even for our proprietary dataset.

Highlights

Recognition of human emotions is a key technology for ultimate human–robot interaction (HRI)
Various emotion recognition mechanisms based on convolutional neural network (CNN) which are trained in an end-to-end manner have been developed and showed reliable performance [3], [4]
If we can analyze the characteristics of two modalities and calculate their weights, we can achieve a synergy of video modality and EEG modality for emotion recognition

Summary

Introduction

Recognition of human emotions is a key technology for ultimate human–robot interaction (HRI). Conventional emotion recognition algorithms distinguished emotion categories by detecting changes in facial expressions [1], [2]. Various emotion recognition mechanisms based on convolutional neural network (CNN) which are trained in an end-to-end manner have been developed and showed reliable performance [3], [4]. There were many attempts to recognize human emotions from tone information of voice signals [5]. Since the voice information is temporally sparse, those voice tone-based emotion recognition schemes have a fundamental limitation in extracting consecutive emotions. Several emotion recognition algorithms using EEG, which is an electrical bio-signal generated in the human brain have been reported [6]–[8].

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 60	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals.
Baole Fu ... Ming Fu
Frontiers in neuroscience | VOL. 17
Baole Fu, et. al.Baole Fu ... Ming Fu
03 Aug 2023
Frontiers in neuroscience | VOL. 17

Emotion recognition by deeply learned multi-channel textual and EEG features
Yishu Liu ... Guifang Fu
Future Generation Computer Systems | VOL. 119
Yishu Liu, et. al.Yishu Liu ... Guifang Fu
12 Jan 2021
Future Generation Computer Systems | VOL. 119

Spatial-frequency-temporal convolutional recurrent network for olfactory-enhanced EEG emotion recognition
Mengxia Xing ... Zhao Lv
Journal of Neuroscience Methods | VOL. 376
Mengxia Xing, et. al.Mengxia Xing ... Zhao Lv
16 May 2022
Journal of Neuroscience Methods | VOL. 376

Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition
Hengshun Zhou ... Qing Wang
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29
Hengshun Zhou, et. al.Hengshun Zhou ... Qing Wang
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access