RMWSaug: Robust Multi-window Spectrogram Augmentation Approach for Deep Learning based Speech Emotion Recognition

Shehu Mohammed Yusuf,I J Umoh,M B Muazu,Ahmed Abdul Ibrahim,E A Adedokun

doi:10.1109/asyu52992.2021.9598956

Abstract

Data scarcity and speech degradation due to environmental noise are two significant issues in the modelling and deployment speech emotion recognition (SER) systems. Deep learning-based SER systems overfits during modelling because of scarce training samples. Although recent attempts to tackle these issues, simultaneously, using data augmentation have yielded promising results, they are not robust enough to handle speech degradation due to real environmental noise. Thus, there is the need to further improve the classification performance of deployed SER systems. This work proposes an SER system based on a novel robust multi-window spectrogram augmentation (RMWSaug) scheme and, transfer learning to handle these aforementioned issues simultaneously. First, the RMWSaug scheme utilizes the concept of multi-window and multi-noise conditioning of clean speech samples to create additional speech spectrograms required for training. Then, pretrained networks are adapted for speech emotion recognition and finetuned with the generated training datasets to develop a model robust to speech degradation due to noise. Thereby, improving the classification performance in the wild. The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database was selected as benchmark dataset for evaluating the proposed SER system. Experimental results show that the proposed SER system outperformed existing methods when deployed in the wild. The proposed SER system can be deployed to predict the emotions of speakers conversing virtually on online platforms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RMWSaug: Robust Multi-window Spectrogram Augmentation Approach for Deep Learning based Speech Emotion Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.
Tursunov Anvarjon ... Soonil Kwon
Sensors | VOL. 20
Tursunov Anvarjon, et. al.Tursunov Anvarjon ... Soonil Kwon
12 Sep 2020
Sensors | VOL. 20

Speech emotion recognition systems and their security aspects
Itzik Gurowiec ... Nir Nissim
Artificial Intelligence Review | VOL. 57
Itzik Gurowiec, et. al.Itzik Gurowiec ... Nir Nissim
21 May 2024
Artificial Intelligence Review | VOL. 57

Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation
S Lalitha ... Yousef Ajami Alotaibi
Applied Acoustics | VOL. 170
S Lalitha, et. al.S Lalitha ... Yousef Ajami Alotaibi
22 Jul 2020
Applied Acoustics | VOL. 170

Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
Arash Shilandari ... Hossein Khosravi
Signal, Image and Video Processing | VOL. 16
Arash Shilandari, et. al.Arash Shilandari ... Hossein Khosravi
09 Feb 2022
Signal, Image and Video Processing | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RMWSaug: Robust Multi-window Spectrogram Augmentation Approach for Deep Learning based Speech Emotion Recognition

Abstract

Talk to us

Similar Papers