EMOSENSE – Multi-Modal Emotion Recognition to Identify Emotions

De Silva J.A.D.P.R,Nandakumara K.S.S,Lanka P.A.C,Jayawardena R.D.T.M

doi:10.47001/irjiet/2023.710057

Abstract

An extended study has been done over the past years to better comprehend human emotions. The embracement of technology to recognize and react to human emotions has become a required component of society. We present a fully functional multi-modal emotion recognition system in this study that integrates data from text, voice, facial expressions, and body language. In this study, the automatic classification of anger, fear, joy, sadness, surprise, disgust, and neutral emotions from text, facial expressions, voice, and body movements have been studied on the TESS, MELD, FER2013, and EDNLP datasets. Random Forest Classifier has been used for the classification of emotions using body language, VGG16 pre-trained model for facial emotion classification, Logistic Resgression for text emotion classification, and CNN for voice emotion classification. The logistic regression model for text emotion prediction leverages natural language processing (NLP) techniques to extract emotions from textual data. The CNN-based voice model utilizes speech recognition and emotion recognition algorithms to analyze audio signals and detect emotional cues in the speaker's voice. The facial expression model employs a combination of CNN-based VGG16 pre-trained model and modified convolutional layers to detect emotions. Meanwhile, the Random Forest Clasifier model is used to capture and interpret non-verbal cues, such as gestures, posture, and overall body movements, to enrich the emotion detection process. The real strength of our proposed system lies in its ability to synergistically combine information from multiple modalities.

Full Text