Multimodal Sentiment Analysis Based on Multiple Stacked Attention Mechanisms

Chao He,Huijuan Hu,Dongqing Song,Dingju Zhu,Lihua Cai,Yingshan Shen,Nan Zhong

doi:10.1109/cscwd57460.2023.10152653

Abstract

Deciphering sentiments or emotions in face-to-face human interactions is an inherent capability of human intelligence, and thus a natural goal of artificial intelligence. The proliferation of multimedia data in video sites gives rise to multimodal sentiment analysis in various applications and research fields such as movie and product review, opinion polling, and affective computing. In order to improve the performance of multimodal sentiment analysis task, this paper proposes a novel neural network with multiple stacked attention mechanism (MSAM) on multimodal data containing texts, video, and audio at an utterance level. We conduct experiments using two benchmark datasets, namely CMU Multi-modal Opinion-level Sentiment Intensity (CMU-MOSI) corpus, and CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) corpus. Compared with a comprehensive set of state-of-the-art baselines, the evaluation results demonstrate the effectiveness of our proposed MSAM network.

Full Text