Abstract

The swift progress in the study field of human-computer interaction (HCI) causes to increase in the interest in systems for Speech emotion recognition (SER). The speech Emotion Recognition System is the system that can identify the emotional states of human beings from their voice. There are well works in Speech Emotion Recognition for different language but few researches have implemented for Arabic SER systems and that because of the shortage of available Arabic speech emotion databases. The most commonly considered languages for SER is English and other European and Asian languages. Several machine learning-based classifiers that have been used by researchers to distinguish emotional classes: SVMs, RFs, and the KNN algorithm, hidden Markov models (HMMs), MLPs and deep learning. In this paper we propose ASERS-LSTM model for Arabic Speech Emotion Recognition based on LSTM model. We extracted five features from the speech: Mel-Frequency Cepstral Coefficients (MFCC) features, chromagram, Melscaled spectrogram, spectral contrast and tonal centroid features (tonnetz). We evaluated our model using Arabic speech dataset named Basic Arabic Expressive Speech corpus (BAES-DB). In addition of that we also construct a DNN for classify the Emotion and compare the accuracy between LSTM and DNN model. For DNN the accuracy is 93.34% and for LSTM is 96.81%.

Highlights

  • Voice is the sound of human beings it composed by the succession, and the specific arrangement order of the respective control rules sound

  • That result is without applying the preprocessing the accuracy is 96.81% for Long Short Term Memory (LSTM) model and 93.34% for DNN

  • In order to investigation the effect of applying the pre-processing stage Table 2 shows the results after applying the two pre-processing, where they are enhanced to be 97.44% for LSTM and 97.78% for DNN

Read more

Summary

Introduction

Voice is the sound of human beings it composed by the succession, and the specific arrangement order of the respective control rules sound. When two people are on the phone, they are unable to observe the facial expression and the physiological state of the other person, it is possible to roughly estimate the emotional state of the speaker by voice. Emotion is a state that combines human feelings, thoughts, and behaviours. It includes people's psychological reactions to the outside world or their own stimuli, including the physiological reactions that accompany this psychological reaction. In people's daily work and life, the role of emotions is everywhere. In the product development process, if we can identify the emotional state of the user using the products in the process, to understand the user experience, it is possible

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.