Graph based emotion recognition with attention pooling for variable-length utterances

Jiawang Liu,Haoxiang Wang,Mingze Sun,Yao Wei

doi:10.1016/j.neucom.2022.05.007

Abstract

Previous speech emotion recognition (SER) methods normally deal with variable-length utterance inputs by padding shorter ones or clipping longer ones into equal-length utterances, which may introduce invalid information or discard useful emotional segments. To address this issue, in this paper, we cast the SER problem into a graph classification task by transforming variable-length utterances into graphs to avoid padding or cutting. In our approach, frames (short windowed segments) in an utterance are presented as nodes in a graph. Acoustic features extracted from frames are treated as node feature vectors and nodes are connected according to their temporal relationship. Different graph convolutional networks (GCNs) are explored for node/frame embedding learning, and kinds of graph pooling methods are compared to obtain graph/utterance-level emotional representation from node embeddings. Extensive experiments with different GCN components and pooling mechanisms are conducted on the IEMOCAP and MSP-IMPRO datasets. The experimental results show that a combination of GraphSAGE with multi-head attention pooling (MHAPool) achieves the best weighted accuracy (WA) and comparable unweighted accuracy (UA) on both datasets compared with other state-of-the-art SER models, which demonstrates the effectiveness of the proposed graph-based network for SER task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Graph based emotion recognition with attention pooling for variable-length utterances

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: May 6, 2022
Citations: 4

Similar Papers

An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition
Pengcheng Li ... Wu Guo
-
Pengcheng Li, et. al.Pengcheng Li ... Wu Guo
02 Sep 2018
02 Sep 2018

Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention
Zixuan Peng ... Yu Lu
-
Zixuan Peng, et. al.Zixuan Peng ... Yu Lu
06 Jun 2021
06 Jun 2021

Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion
Shaode Yu ... Hang Yu
Electronics | VOL. 13
Shaode Yu, et. al.Shaode Yu ... Hang Yu
04 Jun 2024
Electronics | VOL. 13

MVIB-DVA: Learning minimum sufficient multi-feature speech emotion embeddings under dual-view aware
Guoyan Li ... Jianguo Wei
Expert Systems with Applications | VOL. 246
Guoyan Li, et. al.Guoyan Li ... Jianguo Wei
05 Jan 2024
Expert Systems with Applications | VOL. 246

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Graph based emotion recognition with attention pooling for variable-length utterances

Abstract

Talk to us

Similar Papers

More From: Neurocomputing