King Saud University Emotions Corpus: Construction, Analysis, Evaluation, and Comparison

Ali Hamid Meftah,Yousef A Alotaibi,Yasser Seddiq,Sid Ahmed Selouani,Mustafa A Qamhan

doi:10.1109/access.2021.3070751

Abstract

Emotional speech recognition for the Arabic language is insufficiently tackled in the literature compared to other languages. In this paper, we present the work of creating and verifying the King Saud University Emotions (KSUEmotions) corpus, which was released by the Linguistic Data Consortium (LDC) in 2017 as the first public Arabic emotional speech corpus. KSUEmotions contains an emotional speech of twenty-three speakers from Saudi Arabia, Syria, and Yemen, and includes the emotions: neutral, happiness, sadness, surprise, and anger. The corpus content is verified in two different ways: a human perceptual test by nine listeners who rate emotional performance in audio files, and automatic emotion recognition. Two automatic emotion recognition systems are experimented with: Residual Neural Network and Convolutional Neural Network. This work also experiments with emotion recognition for the English language using the Emotional Prosody Speech and Transcripts Corpus (EPST). The current experimental work is conducted in three tracks: (i) monolingual, where independent experiments for Arabic and English are carried out, (ii) multilingual, where the Arabic and English corpora are merged in as mixed corpus, and (iii) cross-lingual, where models are trained using one language and tested using the other. A challenge encountered in this work is that the two corpora do not contain the same emotions. That problem is tackled by mapping the emotions to the arousal-valance space.

Highlights

Digital emotional speech processing is an essential area of digital speech processing to solve two main problems: comprehension of speech emotions and synthesizing them [1]
MONOLINGUAL EMOTION RECOGNITION 1) KSUEmotions CORPUS This section proposes the results generated by the designed systems over ten runs with similar system parameters to each corpus separately
The researchers in Arabic speech emotion recognition (SER) suffering the lack of an Arabic emotional speech corpus

Summary

Introduction

Digital emotional speech processing is an essential area of digital speech processing to solve two main problems: comprehension of speech emotions and synthesizing them [1]. Speech corpora play a significant role in emotional speech processing. A corpus can be created with spontaneous speech, but it is challenging since it is not easy to find people who express real emotions during recording. There are acted speech corpora created by actors whose performance is very close to genuine emotions. There are elicited speech corpora created by stimulating speakers to evoke some target emotions [2]. The other factors that categorize emotional speech corpora are the spoken languages, the number of

Objectives

Methods

Results

Conclusion