Bag-of-words from image to speech: a multi-classifier emotions recognition system

Mai Ezz-Eldin,Ashraf A. M. Khalaf,Hesham F. A. Hamed

doi:10.14419/ijet.v9i3.30958

Mai Ezz-Eldin, Ashraf A. M. Khalaf + Show 1 more

Open Access

https://doi.org/10.14419/ijet.v9i3.30958

Copy DOI

Abstract

Recently, recognizing the emotional content of speech signals has received considerable research attention. Consequently, systems have been developed to recognize the emotional content of a spoken utterance. Achieving high accuracy in speech emotion recognition remains a challenging problem due to issues related to feature extraction, type, and size. Central to this study is increasing emotion recognition accuracy by porting the bag-of-word (BoW) technique from image to speech for feature processing and clustering. The BoW technique is applied to features extracted from Mel frequency cepstral coefficients (MFCC) which enhances feature quality. The study considers deployment of different classification approaches to examine the performance of the embedded BoW approach. The deployed classifiers include support vector machine (SVM), K-nearest neighbor (KNN), naive Bays (NB), random forest (RF), and extreme gradient boosting (XGBoost). In this study, experiments used the standard RAVDESS audio dataset with eight emotions: angry, calm, happy, surprised, sad, disgusted, fearful and neutral. The maximum accuracy obtained in the angry class using SVM was 85%, while overall accuracy was 80.1 %. The empirical works have proved that using BoW achieves better results in terms of accuracy and processing time compared to other available methods.

Highlights

Speech is a natural modality of human machine interaction
Speech emotion recognition systems have been used in forensic science, to investigate and detect criminals based on their speech and emotions [6]
Confusion matrices of the classification results for multiple classes using support vector machine (SVM), naive Bays (NB), K-nearest neighbor (KNN), random forest (RF) and XGBoost are shown in Fig. 3 for all data from speech and song

Summary

Introduction

Speech is a natural modality of human machine interaction. The purpose of sophisticated speech systems should not be limited to message processing; rather they should understand the underlying intentions of the speaker by detecting expressions in speech [1]. Effective speech emotion recognition models should be able to recognize speakers’ emotion and perform the actions . There are some limitations that degrade most emotion recognition models for almost all existing emotional speech databases [7]. The primary issue that limits recognition accuracy is the lack of benchmarking databases that can be shared among researchers. Another issue is the lack of coordination among researchers in this field; the same mistakes in recording are being repeated for different emotional. The main advantages of using BoW are increased recognition accuracy and reduced processing time. The remainder of this paper is organized as follows: Section 2 discusses previous studies related to speech emotion recognition.

Prior Research

Proposed Recognition System

Feature Extraction Process

BoW and Clustering Processes

Classification Process

Support Vector Machine Classifiers

Naive Bayes Classifier

Random Forest Classifier

Extreme Gradient Boosting Classifier

Simulation Experiments and Results

Discussions

Conclusions and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Engineering & Technology	Publication Date: Aug 30, 2020
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Bag-of-words from image to speech: a multi-classifier emotions recognition system

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering & Technology

Lead the way for us

Similar Papers

A Feature Selection Algorithm Based on Differential Evolution for English Speech Emotion Recognition
Liya Yue ... Pei Hu
Applied sciences | VOL. 13
Liya Yue, et. al.Liya Yue ... Pei Hu
16 Nov 2023
Applied sciences | VOL. 13

Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features
Reza Asadi ... Harriet Fell
The Journal of The Acoustical Society of America | VOL. 137
Reza Asadi, et. al.Reza Asadi ... Harriet Fell
01 Apr 2015
The Journal of The Acoustical Society of America | VOL. 137

Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
Matthieu Saumard
Big data and cognitive computing | VOL. 7
Matthieu SaumardMatthieu Saumard
25 Aug 2023
Big data and cognitive computing | VOL. 7

Effect of Speech Compression on the Automatic Recognition of Emotions
A Albahri ... E Cheng
International Journal of Signal Processing Systems | VOL. 4
A Albahri, et. al.A Albahri ... E Cheng
01 Jan 2015
International Journal of Signal Processing Systems | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bag-of-words from image to speech: a multi-classifier emotions recognition system

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering &amp; Technology

More From: International Journal of Engineering & Technology