Abstract

Multimodal sentiment analysis (MSA) is a very challenging task due to its complex and complementary interactions between multiple modalities, which can be widely applied into areas of product marketing, public opinion monitoring, and so on. However, previous works directly utilized the features extracted from multimodal data, in which the noise reduction within and among multiple modalities has been largely ignored before multimodal fusion. This paper proposes a multi-level attention map network (MAMN) to filter noise before multimodal fusion and capture the consistent and heterogeneous correlations among multi-granularity features for multimodal sentiment analysis. Architecturally, MAMN is comprised of three modules: multi-granularity feature extraction module, multi-level attention map generation module, and attention map fusion module. The first module is designed to sufficiently extract multi-granularity features from multimodal data. The second module is constructed to filter noise and enhance the representation ability for multi-granularity features before multimodal fusion. And the third module is built to extensibly mine the interactions among multi-level attention maps by the proposed extensible co-attention fusion method. Extensive experimental results on three public datasets show the proposed model is significantly superior to the state-of-the-art methods, and demonstrate its effectiveness on two tasks of document-based and aspect-based MSA tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call