Multi-Modal Attention for Speech Emotion Recognition

Zexu Pan,Haizhou Li,Zhaojie Luo,Jichen Yang

doi:10.21437/interspeech.2020-1653

Abstract

Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attention network (MMAN) to make use of visual and textual cues in speech emotion recognition. We propose a novel multi-modal attention mechanism, cLSTM-MMA, which facilitates the attention across three modalities and selectively fuse the information. cLSTM-MMA is fused with other uni-modal sub-networks in the late fusion. The experiments show that speech emotion recognition benefits significantly from visual and textual cues, and the proposed cLSTM-MMA alone is as competitive as other fusion methods in terms of accuracy, but with a much more compact network structure. The proposed hybrid network MMAN achieves state-of-the-art performance on IEMOCAP database for emotion recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Modal Attention for Speech Emotion Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Textual Primacy Online: Impression Formation Based on Textual and Visual Cues in Facebook Profiles
Ayellet Pelled ... Tanya Zilberstein
American Behavioral Scientist | VOL. 61
Ayellet Pelled, et. al.Ayellet Pelled ... Tanya Zilberstein
01 Jun 2017
American Behavioral Scientist | VOL. 61

Con-Text: Text Detection for Fine-Grained Object Classification.
Sezer Karaoglu ... Jan C Van Gemert
IEEE Transactions on Image Processing | VOL. 26
Sezer Karaoglu, et. al.Sezer Karaoglu ... Jan C Van Gemert
24 May 2017
IEEE Transactions on Image Processing | VOL. 26

Children’s recognition of emotion in music and speech
Dianna Vidas ... Genevieve A Dingle
Music & Science | VOL. 1
Dianna Vidas, et. al.Dianna Vidas ... Genevieve A Dingle
01 Jan 2018
Music & Science | VOL. 1

Impacts of Cues on Learning and Attention in Immersive 360-Degree Video: An Eye-Tracking Study.
Rui Liu ... Xiang Xu
Frontiers in Psychology | VOL. 12
Rui Liu, et. al.Rui Liu ... Xiang Xu
27 Jan 2022
Frontiers in Psychology | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Modal Attention for Speech Emotion Recognition

Abstract

Talk to us

Similar Papers