Cross Corpus Speech Emotion Recognition using transfer learning and attention-based fusion of Wav2Vec2 and prosody features

Navid Naderi,Babak Nasersharif

doi:10.1016/j.knosys.2023.110814

Abstract

Speech Emotion Recognition (SER) performance degrades when their training and test conditions or corpora differ. Cross-corpus SER (CCSER) is a research branch that discusses adapting an SER system to identify speech emotions on a corpus that has different recording conditions or language from the training corpus. For CCSER, adaption can be performed in the feature extraction module or emotion classifier, which are the two main components of the SER system. In this paper, we propose AFTL method (attention-based feature fusion along with transfer learning), including methods in both feature extraction and classification for CCSER. In the feature extraction part, we use Wav2Vec 2.0 transformer blocks and prosody features, and we propose an attention method for fusing them. In the classifier part, we use transfer learning for transferring the knowledge of a model trained on source emotional speech corpus to recognize emotions on a target corpus. We performed experiments on numerous speech emotional datasets as target corpora, where we used IEMOCAP as the source corpus. For instance, we achieve 92.45% accuracy on the EmoDB dataset, where we only use 20% of speakers for adapting the source model. In addition, for other target corpora, we obtained admissible results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross Corpus Speech Emotion Recognition using transfer learning and attention-based fusion of Wav2Vec2 and prosody features

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Jul 25, 2023
Citations: 2

Similar Papers

Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation
S Lalitha ... Yousef Ajami Alotaibi
Applied Acoustics | VOL. 170
S Lalitha, et. al.S Lalitha ... Yousef Ajami Alotaibi
22 Jul 2020
Applied Acoustics | VOL. 170

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
Siddique Latif ... Sara Khalifa
IEEE Transactions on Affective Computing | VOL. 14
Siddique Latif, et. al.Siddique Latif ... Sara Khalifa
01 Jul 2023
IEEE Transactions on Affective Computing | VOL. 14

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition
Cheng Lu ... Yuan Zong
Entropy | VOL. 24
Cheng Lu, et. al.Cheng Lu ... Yuan Zong
29 Jul 2022
Entropy | VOL. 24

Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives.
Shiqing Zhang ... Xiaoming Zhao
Frontiers in Neurorobotics | VOL. 15
Shiqing Zhang, et. al.Shiqing Zhang ... Xiaoming Zhao
29 Nov 2021
Frontiers in Neurorobotics | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross Corpus Speech Emotion Recognition using transfer learning and attention-based fusion of Wav2Vec2 and prosody features

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems