Decoding Human Emotions Through Multimodal Analysis Using Deep Learning

Shanthini S,

doi:10.55041/ijsrem30007

Abstract

In the evolving landscape of emotion recognition, this study addresses the inherent shortcomings in existing technology. Recognizing the imperative role emotions play in human interaction, the need for a more nuanced and accurate multimodal emotion recognition system is evident. This research aims to refine multimodal emotion recognition by addressing existing challenges. The problem statement revolves around enhancing the accuracy and depth of emotion recognition through innovative methodologies. Leveraging Convolutional Neural Networks (CNN), Attention layers, and Bi-LSTM networks, the study utilizes a comprehensive approach. Data fusion involves combining audio extracted from video data, followed by fusion combinations of audio and text. The methodology incorporates SoftMax classifiers for feature recognition. The fusion of audio, text, and video data, coupled with the innovative use of CNN-LSTM and Attention layer networks, contributes to this success. The discussion interprets these results, highlighting the efficacy of the proposed approach in addressing the challenges of multimodal emotion recognition. The outcomes signify a substantial advancement in the field, providing a more comprehensive understanding of emotional expressions.

Full Text