Abstract

Multimodal sentiment analysis (MSA) is a considerable research area within the domain of social networks where users continuously share a vast amount of content. This content includes text, images, and audio, all of which are accompanied with sentiment or emotion expressions. MSA encompasses multiple tasks, such as sentiment prediction and emotion recognition, which have multi-task dynamic relationships owing to their close correlation and complementary nature. However, previous works using a single-task framework have been limited in exploring these dynamics. In this work, we propose a discriminative joint multi-task framework (DJMF) realizing sentiment prediction and emotion recognition synchronously. The discriminative learning strategy takes into account intra-task dynamics, and customized fusion schemes are applied to the subtasks. Furthermore, we utilize a joint training strategy to capture the inter-task dynamics by leveraging their interrelatedness, enhancing the performance of each task. Additionally, in the fusion process, we incorporate two categories of regularizers to enable the model to obtain distinct fusion results by reducing redundancy and task-irrelevant information. We conducted extensive evaluations on three well-known multimodal baseline datasets: CMU-MOSI, CMU-MOSEI, and UR-FUNNY. The results demonstrate that our approach offers better reliability and stability compared to the single-task framework and outperforms state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call