Learning multi-scale features for speech emotion recognition with connection attention mechanism

Zengzhao Chen,Jiawen Li,Hai Liu,Xuyang Wang,Hu Wang,Qiuyu Zheng

doi:10.1016/j.eswa.2022.118943

Abstract

Speech emotion recognition (SER) has become a crucial topic in the field of human–computer interactions. Feature representation plays an important role in SER, but there are still many challenges in feature representation such as the inability to predict which features are most effective for SER and the cultural differences in emotion expression. Most previous studies use a single type of feature for the recognition task or conduct early fusion of features. However, a single type of feature cannot well reflect the emotions of speech signals. Also, different features contain different information, direct fusion cannot integrate the advantages of different features. To overcome these challenges, this paper proposes a parallel network for multi-scale SER based on a connection attention mechanism (AMSNet). AMSNet fuses fine-grained frame-level manual features with coarse-grained utterance-level deep features. Meanwhile, it adopts different speech emotion feature extraction modules according to the temporal and spatial features of speech signals, which enriches features and improves feature characterization. The network consists of a frame-level representation learning module (FRLM) based on the time structure and an utterance-level representation learning module (URLM) based on the global structure. Besides, improved attention-based long short-term memory (LSTM) is introduced into FRLM to focus on the frames that contribute more to the final emotion recognition result. In URLM, a convolutional neural network with the squeeze-and-excitation block (SCNN) is introduced to extract deep features. In addition, the connection attention mechanism is proposed for feature fusion, which applies different weights to different features. Extensive experiments are conducted on the IEMOCAP and EmoDB datasets, and the results demonstrate the effectiveness and performance superiority of AMSNet. Our code will be publicly available at https://codeocean.com/capsule/8636967/tree/v1.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning multi-scale features for speech emotion recognition with connection attention mechanism

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Oct 8, 2022
Citations: 39

Similar Papers

Modulation spectral features for speech emotion recognition using deep neural networks
Premjeet Singh ... Goutam Saha
Speech Communication | VOL. 146
Premjeet Singh, et. al.Premjeet Singh ... Goutam Saha
19 Nov 2022
Speech Communication | VOL. 146

The Effects of Normalisation Methods on Speech Emotion Recognition
Tshephisho Joseph Sefara
-
Tshephisho Joseph SefaraTshephisho Joseph Sefara
01 Nov 2019
01 Nov 2019

Exploring the benefits of discretization of acoustic features for speech emotion recognition
Thurid Vogt ... Elisabeth André
-
Thurid Vogt, et. al.Thurid Vogt ... Elisabeth André
06 Sep 2009
06 Sep 2009

Research on Speech Emotional Feature Extraction Based on Multidimensional Feature Fusion
Chunjun Zheng ... Wei Sun
-
Chunjun Zheng, et. al.Chunjun Zheng ... Wei Sun
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning multi-scale features for speech emotion recognition with connection attention mechanism

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications