SERNet: A Novel Speech Emotion Recognition System Using Ensemble Deep Learning Approach

Diponkor Bala,Mohammad Anwarul Islam,Md Ibrahim Abdullah,Mohammad Alamgir Hossain,Md Atiqur Rahman,Md Shamim Hossain

doi:10.1109/sti56238.2022.10103272

Abstract

Speech is among the most natural methods for us as human beings to express ourselves. Due to the relevance of emotions in today’s digital world of distant communication, their detection and analysis are crucial. Emotion recognition is extremely difficult since emotions are different for everyone. Speech emotion recognition is one area of application in which deep neural networks have excelled. Single learners have been used in the majority of the work done in this subject. We have developed a Speech Emotion Recognition (SER) system named SERNet that processes and classifies speech inputs to recognize emotions. As part of our research, we tried to explore emotional feelings in audio talks by looking at the acoustic characteristics of such recordings. This study proposes a novel approach using an ensemble of binary classifiers to simplify the multiclass classification problem into a binary classification problem, aiming to improve overall model performance. The binary classifiers are ensembled using a multilayer perceptron to obtain their final predictions on the multiclass classification problem. Utilizing a benchmark dataset designed specifically for the purpose of speech emotion recognition, the efficacy of this strategy has been proven. On the basis of the findings of the experiments, with an accuracy of 98.81%, this technique exceeds the most advanced models currently available.

Full Text