A New Amharic Speech Emotion Dataset and Classification Benchmark

Ephrem Afele Retta,Mustafa Mhamed,Richard Sutcliffe,Eiad Almekhlafi,Jun Feng,Haider Ali

doi:10.1145/3529759

Abstract

In this article we present the Amharic Speech Emotion Dataset (ASED), which covers four dialects (Gojjam, Wollo, Shewa, and Gonder) and five different emotions (neutral, fearful, happy, sad, and angry). We believe it is the first Speech Emotion Recognition (SER) dataset for the Amharic language. Sixty-five volunteer participants, all native speakers of Amharic, recorded 2,474 sound samples, 2 to 4 seconds in length. Eight judges (two for each dialect) assigned emotions to the samples with high agreement level (Fleiss kappa = 0.8). The resulting dataset is freely available for download. Next, we developed a four-layer variant of the well-known VGG model, which we call VGGb. Three experiments were then carried out using VGGb for SER, using ASED. First, we investigated which features work best for Amharic, FilterBank, Mel Spectrogram, or Mel-frequency Cepstral Coefficient (MFCC). This was done by training three VGGb SER models on ASED, using FilterBank, Mel Spectrogram, and MFCC features, respectively. Four forms of training were tried, standard cross-validation and three variants based on sentences, dialects, and speaker groups. Thus, a sentence used for training would not be used for testing, and the same for a dialect and speaker group. MFCC features were superior under all four training schemes. MFCC was therefore adopted for Experiment 2, where VGGb and three well-known existing models were compared on ASED: RESNet50, AlexNet, and LSTM. VGGb was found to have very good accuracy (90.73%) as well as the fastest training time. In Experiment 3, the performance of VGGb was compared when trained on two existing SER datasets—RAVDESS (English) and EMO-DB (German)—as well as on ASED (Amharic). Results are comparable across these languages, with ASED being the highest. This suggests that VGGb can be successfully applied to other languages. We hope that ASED will encourage researchers to explore the Amharic language and to experiment with other models for Amharic SER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Amharic Speech Emotion Dataset and Classification Benchmark

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Jan 31, 2023
Citations: 5

Similar Papers

Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications
Gudmalwar Ashishkumar Prabhakar ... Ch V Rama Rao
IEEE Transactions on Consumer Electronics | VOL. 69
Gudmalwar Ashishkumar Prabhakar, et. al.Gudmalwar Ashishkumar Prabhakar ... Ch V Rama Rao
01 May 2023
IEEE Transactions on Consumer Electronics | VOL. 69

Deep Learning Based Emotion Classification Using Mel Frequency Magnitude Coefficient
Siba Prasad Mishra ... Suman Deb
-
Siba Prasad Mishra, et. al.Siba Prasad Mishra ... Suman Deb
04 Mar 2023
04 Mar 2023

Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales
Sugan Nagarajan ... Aniruddha Kanhe
Digital Signal Processing | VOL. 104
Sugan Nagarajan, et. al.Sugan Nagarajan ... Aniruddha Kanhe
11 May 2020
Digital Signal Processing | VOL. 104

Virtual human speech emotion recognition based on multi-channel CNN: MFCC, LPC, and F0 features
Liwen Ke
Journal of Physics: Conference Series | VOL. 2664
Liwen KeLiwen Ke
01 Dec 2023
Journal of Physics: Conference Series | VOL. 2664

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Amharic Speech Emotion Dataset and Classification Benchmark

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing