Vector learning representation for generalized speech emotion recognition

Sattaya Singkul,Kuntpong Woraratpanya

doi:10.1016/j.heliyon.2022.e09196

Sattaya Singkul, Kuntpong Woraratpanya

Open Access

https://doi.org/10.1016/j.heliyon.2022.e09196

Copy DOI

Journal: Heliyon	Publication Date: Mar 1, 2022
Citations: 6	License type: cc-by-nc-nd

Affiliation: King Mongkut's Institute of Technology Ladkrabang

Abstract

Speech emotion recognition (SER) plays an important role in global business today to improve service efficiency. In the literature of SER, many techniques have been using deep learning to extract and learn features. Recently, we have proposed end-to-end learning for a deep residual local feature learning block (DeepResLFLB). The advantages of end-to-end learning are low engineering effort and less hyperparameter tuning. Nevertheless, this learning method is easily to fall into an overfitting problem. Therefore, this paper described the concept of the “verify-to-classify” framework to apply for learning vectors, extracted from feature spaces of emotional information. This framework consists of two important portions: speech emotion learning and recognition. In speech emotion learning, consisting of two steps: speech emotion verification enrolled training and prediction, the residual learning (ResNet) with squeeze-excitation (SE) block was used as a core component of both steps to extract emotional state vectors and build an emotion model by the speech emotion verification enrolled training. Then the in-domain pre-trained weights of the emotion trained model are transferred to the prediction step. As a result of the speech emotion learning, the accepted model—validated by EER—is transferred to the speech emotion recognition in terms of out-domain pre-trained weights, which are ready for classification using a classical ML method. In this manner, a suitable loss function is important to work with emotional vectors. Here, two loss functions were proposed: angular prototypical and softmax with angular prototypical losses. Based on two publicly available datasets: Emo-DB and RAVDESS, both with high- and low-quality environments. The experimental results show that our proposed method can significantly improve generalized performance and explainable emotion results, when evaluated by standard metrics: EER, accuracy, precision, recall, and F1-score.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Vector learning representation for generalized speech emotion recognition

Abstract

Talk to us

Similar Papers

More From: Heliyon

Lead the way for us

Similar Papers

Children’s recognition of emotion in music and speech
Dianna Vidas ... Genevieve A Dingle
Music & Science | VOL. 1
Dianna Vidas, et. al.Dianna Vidas ... Genevieve A Dingle
01 Jan 2018
Music & Science | VOL. 1

Deep Residual Local Feature Learning for Speech Emotion Recognition
Sattaya Singkul ... Thakorn Chatchaisathaporn
-
Sattaya Singkul, et. al.Sattaya Singkul ... Thakorn Chatchaisathaporn
01 Jan 2020
01 Jan 2020

Time Dependent ARMA for Automatic Recognition of Fear-Type Emotions in Speech
J C Vásquez-Correa ... L D Avendaño
-
J C Vásquez-Correa, et. al.J C Vásquez-Correa ... L D Avendaño
01 Jan 2015
01 Jan 2015

In-depth investigation of speech emotion recognition studies from past to present –The importance of emotion recognition from speech signal for AI–
Yeşim Ülgen Sönmez ... Asaf Varol
Intelligent Systems with Applications | VOL. 22
Yeşim Ülgen Sönmez, et. al.Yeşim Ülgen Sönmez ... Asaf Varol
11 Mar 2024
Intelligent Systems with Applications | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vector learning representation for generalized speech emotion recognition

Abstract

Talk to us

Similar Papers

More From: Heliyon