Interactive Fusion Network with Recurrent Attention for Multimodal Aspect-based Sentiment Analysis

Jun Wang,Qianlong Wang,Xingwei Liang,Zhiyuan Wen,Ruifeng Xu

doi:10.1007/978-3-031-20503-3_24

Abstract

AbstractThe goal of multimodal aspect-based sentiment analysis is to comprehensively utilize data from different modalities (e.g.,, text and image) to identify aspect-specific sentiment polarity. Existing works have proposed many methods for fusing text and image information and achie-ved satisfactory results. However, they fail to filter noise in the image information and ignore the progressive learning process of sentiment features. To solve these problems, we propose an interactive fusion network with recurrent attention. Specifically, we first use two encoders to encode text and image data, respectively. Then we use the attention mechanism to obtain the semantic information of the image at the token level. Next, we employ GRU to filter out the noise in the image and fuse information from different modalities. Finally, we design a decoder with recurrent attention to progressively learn aspect-specific sentiment features for classification. The results on two Twitter datasets show that our method outperforms all baselines.KeywordsMultimodal aspect-based sentiment analysisAttention mechanismProgressively learning

Full Text