Multimodal sentiment recognition: A comprehensive review of analysis techniques, applications, and challenges

Yuetong Hao

doi:10.54254/2753-8818/13/20240860

Abstract

This article offers a comprehensive review of multimodal emotion recognition. Initially, we provide a succinct overview of human emotions by delving into various emotion models. The focus then shifts to the representation of emotions through unimodal data sources, illustrating how these singular channels capture emotional cues. As we progress, the article underscores the advancements in multimodal fusion techniques that amalgamate multiple data sources for more accurate emotion detection. Three predominant architectures for multimodal fusion are then detailed, each showcasing its unique approach and benefits. Despite the leaps and bounds made in this field, several challenges persist. The latter section of this review highlights these lingering issues in multimodal emotion recognition, hinting at potential areas for further research and development. By weaving together the intricacies of emotion models, unimodal representations, fusion techniques, and existing challenges, this paper seeks to provide readers with a holistic understanding of the current landscape of multimodal emotion recognition.

Full Text