SPEECH EMOTION RECOGNITION USING MACHINE LEARNING AND DEEP LEARNING

Utkarsh Kumar Singh,Sudhanshu Singh,Radhey Shyam,Shilpi Khanna

doi:10.33564/ijeast.2022.v06i11.034

Abstract

This paper presents a study of human emotions emitted through sound or speech. Perceiving a person’s emotions through sound has always been a difficult task for the machines. If a machine can recognize the emotion emitted by its user, it would be easier for it to help the user in a better way to perform the task he needs to do. Many researchers have worked on this problem so far. Some have classified the emotions as four basic human emotions: “Happy”, “Sad, “Angry” and “Neutral”. And there has been usage of dimensional aspects such as Valence (Positivity), Activation (Energy) and Dominance (Controlling impact) to detect emotion using speech. Speech emption recognition is quite difficult for machine learning as the analysis of sound and speech signals is difficult to do as it includes a plethora of frequencies and features. We have also done a comparative study on different modes and datasets namely, RAVDESS, IEMOCAP datasets. We have implemented Multi-Layer-Perceptron model which is performing at 66.2% accuracy for the emotions such as happiness, sadness, anger, neutral, calm, disgust, etc. Our evaluation shows that the proposed approach yields accuracies of 65.8%, 66.2%, and 63.2% using RF, MLP Classifier and SVM Classifiers, respectively.

Full Text