ACG-EmoCluster: A Novel Framework to Capture Spatial and Temporal Information from Emotional Speech Enhanced by DeepCluster

Huan Zhao,Zixing Zhang,Xupeng Zha,Lixuan Li,Zhaoxin Xie,Yujiang Wang

doi:10.3390/s23104777

Abstract

Speech emotion recognition (SER) is a task that tailors a matching function between the speech features and the emotion labels. Speech data have higher information saturation than images and stronger temporal coherence than text. This makes entirely and effectively learning speech features challenging when using feature extractors designed for images or texts. In this paper, we propose a novel semi-supervised framework for extracting spatial and temporal features from speech, called the ACG-EmoCluster. This framework is equipped with a feature extractor for simultaneously extracting the spatial and temporal features, as well as a clustering classifier for enhancing the speech representations through unsupervised learning. Specifically, the feature extractor combines an Attn–Convolution neural network and a Bidirectional Gated Recurrent Unit (BiGRU). The Attn–Convolution network enjoys a global spatial receptive field and can be generalized to the convolution block of any neural networks according to the data scale. The BiGRU is conducive to learning temporal information on a small-scale dataset, thereby alleviating data dependence. The experimental results on the MSP-Podcast demonstrate that our ACG-EmoCluster can capture effective speech representation and outperform all baselines in both supervised and semi-supervised SER tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ACG-EmoCluster: A Novel Framework to Capture Spatial and Temporal Information from Emotional Speech Enhanced by DeepCluster

Abstract

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Journal: Sensors	Publication Date: May 16, 2023
License type: CC BY 4.0

Similar Papers

A Parallel Hybrid Neural Network With Integration of Spatial and Temporal Features for Remaining Useful Life Prediction in Prognostics
Jiusi Zhang ... Shen Yin
IEEE Transactions on Instrumentation and Measurement | VOL. 72
Jiusi Zhang, et. al.Jiusi Zhang ... Shen Yin
01 Jan 2023
IEEE Transactions on Instrumentation and Measurement | VOL. 72

Design of urban road fault detection system based on artificial neural network and deep learning.
Ying Lin
Frontiers in Neuroscience | VOL. 18
Ying LinYing Lin
29 Apr 2024
Frontiers in Neuroscience | VOL. 18

TGA: A Novel Network Intrusion Detection Method Based on TCN, BiGRU and Attention Mechanism
Yangyang Song ... Haojie Wu
Electronics | VOL. 12
Yangyang Song, et. al.Yangyang Song ... Haojie Wu
27 Jun 2023
Electronics | VOL. 12

A short-term wind speed prediction method utilizing rolling decomposition and time-series extension to avoid information leakage
Pinhan Zhou ... Guoji Xu
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects | VOL. 46
Pinhan Zhou, et. al.Pinhan Zhou ... Guoji Xu
28 Feb 2024
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ACG-EmoCluster: A Novel Framework to Capture Spatial and Temporal Information from Emotional Speech Enhanced by DeepCluster

Abstract

Talk to us

Similar Papers

More From: Sensors