Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Maros Jakubec,Eva Lieskovska,Roman Jarina,Michal Spisiak,Peter Kasak

doi:10.3390/app14219981

Maros Jakubec, Eva Lieskovska + Show 3 more

Open Access

https://doi.org/10.3390/app14219981

Copy DOI

Export

Save

Cite

Journal: Applied Sciences	Publication Date: Oct 31, 2024
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Automatic Speech Emotion Recognition (SER) plays a vital role in making human–computer interactions more natural and effective. A significant challenge in SER development is the limited availability of diverse emotional speech datasets, which hinders the application of advanced deep learning models. Transfer learning is a machine learning technique that helps address this issue by utilizing knowledge from pre-trained models to improve performance on a new task in a target domain, even with limited data. This study investigates the use of transfer learning from various pre-trained networks, including speaker embedding models such as d-vector, x-vector, and r-vector, and image classification models like AlexNet, GoogLeNet, SqueezeNet, ResNet-18, and ResNet-50. We also propose enhanced versions of the x-vector and r-vector models incorporating Multi-Head Attention Pooling and Angular Margin Softmax, alongside other architectural improvements. Additionally, reverberation from the Room Impulse Response datasets was added to the speech utterances to diversify and augment the available data. Notably, the enhanced r-vector model achieved classification accuracies of 74.05% Unweighted Accuracy (UA) and 73.68% Weighted Accuracy (WA) on the IEMOCAP dataset, and 80.25% UA and 79.81% WA on the CREMA-D dataset, outperforming the existing state-of-the-art methods. This study shows that using cross-domain transfer learning is beneficial for low-resource emotion recognition. The enhanced models developed in other domains (for non-emotional tasks) can further improve the accuracy of SER.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Abstract

Published Version

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion
Shaode Yu ... Yaoqin Xie
Electronics | VOL. 13
Shaode Yu, et. al.Shaode Yu ... Yaoqin Xie
04 Jun 2024
Electronics | VOL. 13

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation
Sarala Padi ... Ram D Sriram
-
Sarala Padi, et. al.Sarala Padi ... Ram D Sriram
18 Oct 2021
18 Oct 2021

Multi-dimensional Convolutional Neural Network for Speech Emotion Recognition
Ziqiang Bao ... Shuang Li
-
Ziqiang Bao, et. al.Ziqiang Bao ... Shuang Li
03 Jul 2022
03 Jul 2022

A bimodal network based on Audio–Text-Interactional-Attention with ArcFace loss for speech emotion recognition
Yuwu Tang ... Hao Huang
Speech Communication | VOL. 143
Yuwu Tang, et. al.Yuwu Tang ... Hao Huang
22 Jul 2022
Speech Communication | VOL. 143

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Abstract

Published Version

Talk to us

Similar Papers

More From: Applied Sciences