DJMF: A discriminative joint multi-task framework for multimodal sentiment analysis based on intra- and inter-task dynamics

Junpeng Gong,Yujun Wen,Yao Zheng,Pengzhou Zhang

doi:10.1016/j.eswa.2023.122728

Abstract

Multimodal sentiment analysis (MSA) is a considerable research area within the domain of social networks where users continuously share a vast amount of content. This content includes text, images, and audio, all of which are accompanied with sentiment or emotion expressions. MSA encompasses multiple tasks, such as sentiment prediction and emotion recognition, which have multi-task dynamic relationships owing to their close correlation and complementary nature. However, previous works using a single-task framework have been limited in exploring these dynamics. In this work, we propose a discriminative joint multi-task framework (DJMF) realizing sentiment prediction and emotion recognition synchronously. The discriminative learning strategy takes into account intra-task dynamics, and customized fusion schemes are applied to the subtasks. Furthermore, we utilize a joint training strategy to capture the inter-task dynamics by leveraging their interrelatedness, enhancing the performance of each task. Additionally, in the fusion process, we incorporate two categories of regularizers to enable the model to obtain distinct fusion results by reducing redundancy and task-irrelevant information. We conducted extensive evaluations on three well-known multimodal baseline datasets: CMU-MOSI, CMU-MOSEI, and UR-FUNNY. The results demonstrate that our approach offers better reliability and stability compared to the single-task framework and outperforms state-of-the-art methods.

Full Text