Multimodal Emotion Recognition with Deep Learning: Advancements, challenges, and future directions

Geetha A.V,Mala T,Priyanka D,Uma E

doi:10.1016/j.inffus.2023.102218

Abstract

In recent years, affective computing has become a topic of considerable interest, driven by its ability to enhance several domains, such as mental health monitoring, human–computer interaction, and personalized advertising. The progress of affective computing has been extensively supported by the emergence of sub-domains such as sentiment analysis and emotion recognition. Furthermore, Deep Learning (DL) techniques have made significant advancements in the realm of emotion recognition, resulting in the emergence of Multimodal Emotion Recognition (MER) systems that are capable of effectively processing data from various sources, such as audio, video, and text. However, despite the considerable progress made, there are still several challenges that persist in MER systems. Moreover, existing surveys often lack a specific focus on MER and the associated DL architectures. To address these research gaps, this study provides an in-depth systematic review of DL-based MER systems. This review encompasses the recent state-of-the-art models, foundational theories, DL architectures, mechanisms for fusing multimodal information, relevant datasets, performance evaluation, and practical applications. Additionally, the study identifies key challenges and limitations in MER systems and suggests future research opportunities. The main objective of this review is to provide a thorough comprehension of the present cutting-edge MER, thus enabling researchers in both academia and industry to stay up to date with the most recent developments in this rapidly evolving domain.

Full Text