Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

Hailun Lian,Cheng Lu,Yan Zhao,Sunan Li,Tianhua Qi,Yuan Zong

doi:10.1016/j.eswa.2024.125162

Abstract

Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l2,1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus-invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Similar Papers

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition
Cheng Lu ... Chuangao Tang
Entropy | VOL. 24
Cheng Lu, et. al.Cheng Lu ... Chuangao Tang
29 Jul 2022
Entropy | VOL. 24

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
Siddique Latif ... Björn Schuller
IEEE Transactions on Affective Computing | VOL. 14
Siddique Latif, et. al.Siddique Latif ... Björn Schuller
01 Jul 2023
IEEE Transactions on Affective Computing | VOL. 14

Multi-scale discrepancy adversarial network for cross-corpus speech emotion recognition
Wanlu Zheng ... Yuan Zong
Virtual Reality & Intelligent Hardware | VOL. 3
Wanlu Zheng, et. al.Wanlu Zheng ... Yuan Zong
01 Feb 2021
Virtual Reality & Intelligent Hardware | VOL. 3

Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation.
Hongliang Fu ... Zhihao Zhuang
Entropy | VOL. 25
Hongliang Fu, et. al.Hongliang Fu ... Zhihao Zhuang
07 Jan 2023
Entropy | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications