Speech emotion recognition using data augmentation method by cycle-generative adversarial networks

Arash Shilandari,Hossein Marvi,Wenwu Wang,Hossein Khosravi

doi:10.1007/s11760-022-02156-9

Arash Shilandari, Hossein Marvi + Show 2 more

Open Access

https://doi.org/10.1007/s11760-022-02156-9

Copy DOI

Abstract

One of the obstacles in developing speech emotion recognition (SER) systems is the data scarcity problem, i.e., the lack of labeled data for training these systems. Data augmentation is an effective method for increasing the amount of training data. In this paper, we propose a cycle-generative adversarial network (cycle-GAN) for data augmentation in the SER systems. For each of the five emotions considered, an adversarial network is designed to generate data that have a similar distribution to the main data in that class but have a different distribution to those of other classes. These networks are trained in an adversarial way to produce feature vectors similar to those in the training set, which are then added to the original training sets. Instead of using the common cross-entropy loss to train cycle-GANs, we use the Wasserstein divergence to mitigate the gradient vanishing problem and to generate high-quality samples. The proposed network has been applied to SER using the EMO-DB dataset. The quality of the generated data is evaluated using two classifiers based on support vector machine and deep neural network. The results showed that the recognition accuracy in unweighted average recall was about 83.33%, which is better than the baseline methods compared.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Signal, Image and Video Processing	Publication Date: Feb 9, 2022
Citations: 14	License type: cc-by

R Discovery Prime

R Discovery Prime

Speech emotion recognition using data augmentation method by cycle-generative adversarial networks

Abstract

Talk to us

Similar Papers

More From: Signal, Image and Video Processing

Lead the way for us

Similar Papers

Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation
S Lalitha ... Yousef Ajami Alotaibi
Applied Acoustics | VOL. 170
S Lalitha, et. al.S Lalitha ... Yousef Ajami Alotaibi
22 Jul 2020
Applied Acoustics | VOL. 170

Speech emotion recognition systems and their security aspects
Itzik Gurowiec ... Nir Nissim
Artificial Intelligence Review | VOL. 57
Itzik Gurowiec, et. al.Itzik Gurowiec ... Nir Nissim
21 May 2024
Artificial Intelligence Review | VOL. 57

RMWSaug: Robust Multi-window Spectrogram Augmentation Approach for Deep Learning based Speech Emotion Recognition
Shehu Mohammed Yusuf ... E A Adedokun
-
Shehu Mohammed Yusuf, et. al.Shehu Mohammed Yusuf ... E A Adedokun
06 Oct 2021
06 Oct 2021

Emotional speech Recognition using CNN and Deep learning techniques
C Hema ... Fausto Pedro Garcia Marquez
Applied Acoustics | VOL. 211
C Hema, et. al.C Hema ... Fausto Pedro Garcia Marquez
28 Jun 2023
Applied Acoustics | VOL. 211

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech emotion recognition using data augmentation method by cycle-generative adversarial networks

Abstract

Talk to us

Similar Papers

More From: Signal, Image and Video Processing