Abstract

Emotional speech recognition for the Arabic language is insufficiently tackled in the literature compared to other languages. In this paper, we present the work of creating and verifying the King Saud University Emotions (KSUEmotions) corpus, which was released by the Linguistic Data Consortium (LDC) in 2017 as the first public Arabic emotional speech corpus. KSUEmotions contains an emotional speech of twenty-three speakers from Saudi Arabia, Syria, and Yemen, and includes the emotions: neutral, happiness, sadness, surprise, and anger. The corpus content is verified in two different ways: a human perceptual test by nine listeners who rate emotional performance in audio files, and automatic emotion recognition. Two automatic emotion recognition systems are experimented with: Residual Neural Network and Convolutional Neural Network. This work also experiments with emotion recognition for the English language using the Emotional Prosody Speech and Transcripts Corpus (EPST). The current experimental work is conducted in three tracks: (i) monolingual, where independent experiments for Arabic and English are carried out, (ii) multilingual, where the Arabic and English corpora are merged in as mixed corpus, and (iii) cross-lingual, where models are trained using one language and tested using the other. A challenge encountered in this work is that the two corpora do not contain the same emotions. That problem is tackled by mapping the emotions to the arousal-valance space.

Highlights

  • Digital emotional speech processing is an essential area of digital speech processing to solve two main problems: comprehension of speech emotions and synthesizing them [1]

  • MONOLINGUAL EMOTION RECOGNITION 1) KSUEmotions CORPUS This section proposes the results generated by the designed systems over ten runs with similar system parameters to each corpus separately

  • The researchers in Arabic speech emotion recognition (SER) suffering the lack of an Arabic emotional speech corpus

Read more

Summary

Introduction

Digital emotional speech processing is an essential area of digital speech processing to solve two main problems: comprehension of speech emotions and synthesizing them [1]. Speech corpora play a significant role in emotional speech processing. A corpus can be created with spontaneous speech, but it is challenging since it is not easy to find people who express real emotions during recording. There are acted speech corpora created by actors whose performance is very close to genuine emotions. There are elicited speech corpora created by stimulating speakers to evoke some target emotions [2]. The other factors that categorize emotional speech corpora are the spoken languages, the number of

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call