ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model

Ohammed Tajalsir,Fatima Abdalbagi Mohammed,Susana Mu˜Noz Hern´Andez

doi:10.5121/sipij.2022.13102

Abstract

The swift progress in the study field of human-computer interaction (HCI) causes to increase in the interest in systems for Speech emotion recognition (SER). The speech Emotion Recognition System is the system that can identify the emotional states of human beings from their voice. There are well works in Speech Emotion Recognition for different language but few researches have implemented for Arabic SER systems and that because of the shortage of available Arabic speech emotion databases. The most commonly considered languages for SER is English and other European and Asian languages. Several machine learning-based classifiers that have been used by researchers to distinguish emotional classes: SVMs, RFs, and the KNN algorithm, hidden Markov models (HMMs), MLPs and deep learning. In this paper we propose ASERS-LSTM model for Arabic Speech Emotion Recognition based on LSTM model. We extracted five features from the speech: Mel-Frequency Cepstral Coefficients (MFCC) features, chromagram, Melscaled spectrogram, spectral contrast and tonal centroid features (tonnetz). We evaluated our model using Arabic speech dataset named Basic Arabic Expressive Speech corpus (BAES-DB). In addition of that we also construct a DNN for classify the Emotion and compare the accuracy between LSTM and DNN model. For DNN the accuracy is 93.34% and for LSTM is 96.81%.

Highlights

Voice is the sound of human beings it composed by the succession, and the specific arrangement order of the respective control rules sound
That result is without applying the preprocessing the accuracy is 96.81% for Long Short Term Memory (LSTM) model and 93.34% for DNN
In order to investigation the effect of applying the pre-processing stage Table 2 shows the results after applying the two pre-processing, where they are enhanced to be 97.44% for LSTM and 97.78% for DNN

Summary

Introduction

Voice is the sound of human beings it composed by the succession, and the specific arrangement order of the respective control rules sound. When two people are on the phone, they are unable to observe the facial expression and the physiological state of the other person, it is possible to roughly estimate the emotional state of the speaker by voice. Emotion is a state that combines human feelings, thoughts, and behaviours. It includes people's psychological reactions to the outside world or their own stimuli, including the physiological reactions that accompany this psychological reaction. In people's daily work and life, the role of emotions is everywhere. In the product development process, if we can identify the emotional state of the user using the products in the process, to understand the user experience, it is possible

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Signal & Image Processing : An International Journal	Publication Date: Feb 28, 2022
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Signal & Image Processing : An International Journal

Lead the way for us

Similar Papers

ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
Mohammed Tajalsir ... Fatima Abdalbagi Mohammed
Signal & Image Processing : An International Journal | VOL. 13
Mohammed Tajalsir, et. al.Mohammed Tajalsir ... Fatima Abdalbagi Mohammed
28 Feb 2022
Signal & Image Processing : An International Journal | VOL. 13

Emotional speech Recognition using CNN and Deep learning techniques
C Hema ... Fausto Pedro Garcia Marquez
Applied Acoustics | VOL. 211
C Hema, et. al.C Hema ... Fausto Pedro Garcia Marquez
28 Jun 2023
Applied Acoustics | VOL. 211

Speech emotion recognition based on convolutional neural network
Chen Jie
-
Chen JieChen Jie
01 Dec 2021
01 Dec 2021

Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales
Sugan Nagarajan ... Aniruddha Kanhe
Digital Signal Processing | VOL. 104
Sugan Nagarajan, et. al.Sugan Nagarajan ... Aniruddha Kanhe
11 May 2020
Digital Signal Processing | VOL. 104

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Signal &amp; Image Processing : An International Journal

More From: Signal & Image Processing : An International Journal