Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework

Yang Liu,Haoqin Sun,Wenbo Guan,Yuqi Xia,Zhen Zhao

doi:10.1016/j.specom.2022.02.006

Abstract

Accurately recognizing emotion from speech is a necessary yet challenging task due to the variability in speech and emotion. In this paper, a novel method combined self-attention mechanism and multi-scale fusion framework is proposed for multi-modal SER by using speech and text information. A self-attentional bidirectional contextual LSTM (bc-LSTM) is proposed to learn the context-sensitive dependences from speech. Specifically, the BLSTM layer is applied to learn long-term dependencies and utterance-level contextual information and the multi-head self-attention layer makes the model focus on the features that are most related to the emotions. A self-attentional multi-channel CNN (MCNN), which takes advantage of static and dynamic channels, is applied for learning general and thematic features from text. Finally, a multi-scale fusion strategy, including feature-level fusion and decision-level fusion, is applied to improve the overall performance. Experimental results on the benchmark dataset IEMOCAP demonstrate that our method gains an absolute improvement of 1.48% and 3.00% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Mar 3, 2022
Citations: 27

Similar Papers

Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion
Shaode Yu ... Hang Yu
Electronics | VOL. 13
Shaode Yu, et. al.Shaode Yu ... Hang Yu
04 Jun 2024
Electronics | VOL. 13

Using Auxiliary Tasks In Multimodal Fusion of Wav2vec 2.0 And Bert for Multimodal Emotion Recognition
Dekai Sun ... Jiqing Han
-
Dekai Sun, et. al.Dekai Sun ... Jiqing Han
04 Jun 2023
04 Jun 2023

Enterprise Strategic Management From the Perspective of Business Ecosystem Construction Based on Multimodal Emotion Recognition.
Wei Bi ... Hongshen Li
Frontiers in Psychology | VOL. 13
Wei Bi, et. al.Wei Bi ... Hongshen Li
03 Mar 2022
Frontiers in Psychology | VOL. 13

Research on Multi-Label Text Classification Based on Multi-Channel CNN and BiLSTM
Shoujin Wang ... Yuanjiao Yang
-
Shoujin Wang, et. al.Shoujin Wang ... Yuanjiao Yang
01 Oct 2022
01 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework

Abstract

Talk to us

Similar Papers

More From: Speech Communication