A BiLSTM–Transformer and 2D CNN Architecture for Emotion Recognition from Speech

Sera Kim,Seok-Pil Lee

doi:10.3390/electronics12194034

Abstract

The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand and react to human emotions. This study aims to enhance the efficacy of emotion recognition from speech by using dimensionality reduction algorithms for visualization, effectively outlining emotion-specific audio features. As a model for emotion recognition, we propose a new model architecture that combines the bidirectional long short-term memory (BiLSTM)–Transformer and a 2D convolutional neural network (CNN). The BiLSTM–Transformer processes audio features to capture the sequence of speech patterns, while the 2D CNN handles Mel-Spectrograms to capture the spatial details of audio. To validate the proficiency of the model, the 10-fold cross-validation method is used. The methodology proposed in this study was applied to Emo-DB and RAVDESS, two major emotion recognition from speech databases, and achieved high unweighted accuracy rates of 95.65% and 80.19%, respectively. These results indicate that the use of the proposed transformer-based deep learning model with appropriate feature selection can enhance performance in emotion recognition from speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Sep 25, 2023
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A BiLSTM–Transformer and 2D CNN Architecture for Emotion Recognition from Speech

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Deep Learning-Based Approach for Emotion Recognition Using Electroencephalography (EEG) Signals Using Bi-Directional Long Short-Term Memory (Bi-LSTM).
Mona Algarni ... Tawfik Al-Hadhrami
Sensors | VOL. 22
Mona Algarni, et. al.Mona Algarni ... Tawfik Al-Hadhrami
13 Apr 2022
Sensors | VOL. 22

An Improved Multimodal Dimension Emotion Recognition Based on Different Fusion Methods
Haiyang Su ... Bin Liu
-
Haiyang Su, et. al.Haiyang Su ... Bin Liu
06 Dec 2020
06 Dec 2020

Speech Emotion Recognition Based on Two-Stream Deep Learning Model Using Korean Audio Information
A-Hyeon Jo ... Keun-Chang Kwak
Applied Sciences | VOL. 13
A-Hyeon Jo, et. al.A-Hyeon Jo ... Keun-Chang Kwak
08 Feb 2023
Applied Sciences | VOL. 13

An Ensemble Model for Multi-Level Speech Emotion Recognition
Chunjun Zheng ... Chunli Wang
Applied Sciences | VOL. 10
Chunjun Zheng, et. al.Chunjun Zheng ... Chunli Wang
26 Dec 2019
Applied Sciences | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A BiLSTM–Transformer and 2D CNN Architecture for Emotion Recognition from Speech

Abstract

Talk to us

Similar Papers

More From: Electronics