Multimodal Emotion Recognition Based on Deep Temporal Features Using Cross-Modal Transformer and Self-Attention

Bubai Maji,Rajlakshmi Guha,Monorama Swain,Aurobinda Routray

doi:10.1109/icassp49357.2023.10096937

Abstract

Multimodal speech emotion recognition (MSER) is an emerging and challenging field of research due to its more robust characteristics than unimodal. However, in multimodal approaches, the interactive relations for model building using different modalities of speech representations for emotion recognition have not been well investigated yet. To address this issue, we introduce a new approach to capturing the deep temporal features of audio and text. The audio features are learned with a convolution neural network (CNN) and a Bi-directional Gated Recurrent Unit (Bi-GRU) network. The textual features are represented by GloVe word embedding along with Bi-GRU. A cross-modal transformers block is designed for multimodal learning to capture better inter- and intra-interactions and temporal information between the audio and textual features. Further, a self-attention (SA) network is employed to select more important emotional information from the fused multimodal features. We evaluate the proposed method on the IEMOCAP dataset on four emotion classes (i.e., angry, neutral, sad, and happy). The proposed method performs significantly better than the most recent state-of-the-art MSER methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal Emotion Recognition Based on Deep Temporal Features Using Cross-Modal Transformer and Self-Attention

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

STERM: A Multimodal Speech Emotion Recognition Model in Filipino Gaming Settings
Giorgio Armani G Magno ... Lhuijee Jhulo V Cuchapin
-
Giorgio Armani G Magno, et. al.Giorgio Armani G Magno ... Lhuijee Jhulo V Cuchapin
01 Dec 2022
01 Dec 2022

Attention-based multimodal sentiment analysis and emotion recognition using deep neural networks
Ajwa Aslam ... Zulfiqar Habib
Applied Soft Computing | VOL. 144
Ajwa Aslam, et. al.Ajwa Aslam ... Zulfiqar Habib
10 Jun 2023
Applied Soft Computing | VOL. 144

Multimodal Physiological Signal Emotion Recognition Based on Convolutional Recurrent Neural Network
Jinxiang Liao ... Qinghua Zhong
IOP Conference Series: Materials Science and Engineering | VOL. 782
Jinxiang Liao, et. al.Jinxiang Liao ... Qinghua Zhong
01 Mar 2020
IOP Conference Series: Materials Science and Engineering | VOL. 782

End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis
Muhammad Muzammel ... Alice Othmani
Computer Methods and Programs in Biomedicine | VOL. 211
Muhammad Muzammel, et. al.Muhammad Muzammel ... Alice Othmani
28 Sep 2021
Computer Methods and Programs in Biomedicine | VOL. 211

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Emotion Recognition Based on Deep Temporal Features Using Cross-Modal Transformer and Self-Attention

Abstract

Talk to us

Similar Papers