Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

Yang Liu,Zhen Zhao,Yuqi Xia,Haoqin Sun,Wenbo Guan

doi:10.1007/s11633-022-1356-x

Abstract

Due to the complexity of emotional expression, recognizing emotions from the speech is a critical and challenging task. In most of the studies, some specific emotions are easily classified incorrectly. In this paper, we propose a new framework that integrates cascade attention mechanism and joint loss for speech emotion recognition (SER), aiming to solve feature confusions for emotions that are difficult to be classified correctly. First, we extract the mel frequency cepstrum coefficients (MFCCs), deltas, and delta-deltas from MFCCs to form 3-dimensional (3D) features, thus effectively reducing the interference of external factors. Second, we employ spatiotemporal attention to selectively discover target emotion regions from the input features, where self-attention with head fusion captures the long-range dependency of temporal features. Finally, the joint loss function is employed to distinguish emotional embeddings with high similarity to enhance the overall performance. Experiments on interactive emotional dyadic motion capture (IEMOCAP) database indicate that the method achieves a positive improvement of 2.49% and 1.13% in weighted accuracy (WA) and unweighted accuracy (UA), respectively, compared to the state-of-the-art strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

Abstract

Talk to us

Similar Papers

More From: Machine Intelligence Research

Lead the way for us

Journal: Machine Intelligence Research	Publication Date: Jun 1, 2023
Citations: 4

Similar Papers

Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition
Ziping Zhao ... Björn W Schuller
Neural Networks | VOL. 141
Ziping Zhao, et. al.Ziping Zhao ... Björn W Schuller
23 Mar 2021
Neural Networks | VOL. 141

BAT: Block and token self-attention for speech emotion recognition
Jianjun Lei ... Ying Wang
Neural Networks | VOL. 156
Jianjun Lei, et. al.Jianjun Lei ... Ying Wang
29 Sep 2022
Neural Networks | VOL. 156

Speech emotion recognition based on emotion perception
Gang Liu ... Shifang Cai
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023
Gang Liu, et. al.Gang Liu ... Shifang Cai
12 May 2023
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023

Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion
Mingke Xu ... Samee U Khan
-
Mingke Xu, et. al.Mingke Xu ... Samee U Khan
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

Abstract

Talk to us

Similar Papers

More From: Machine Intelligence Research