Abstract

Speech is among the most natural methods for us as human beings to express ourselves. Due to the relevance of emotions in today’s digital world of distant communication, their detection and analysis are crucial. Emotion recognition is extremely difficult since emotions are different for everyone. Speech emotion recognition is one area of application in which deep neural networks have excelled. Single learners have been used in the majority of the work done in this subject. We have developed a Speech Emotion Recognition (SER) system named SERNet that processes and classifies speech inputs to recognize emotions. As part of our research, we tried to explore emotional feelings in audio talks by looking at the acoustic characteristics of such recordings. This study proposes a novel approach using an ensemble of binary classifiers to simplify the multiclass classification problem into a binary classification problem, aiming to improve overall model performance. The binary classifiers are ensembled using a multilayer perceptron to obtain their final predictions on the multiclass classification problem. Utilizing a benchmark dataset designed specifically for the purpose of speech emotion recognition, the efficacy of this strategy has been proven. On the basis of the findings of the experiments, with an accuracy of 98.81%, this technique exceeds the most advanced models currently available.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.