Abstract

Aspect-based sentiment analysis has obtained great success in recent years. Most of the existing work focuses on determining the sentiment polarity of the given aspect according to the given text, while little attention has been paid to the visual information as well as multimodality content for aspect-based sentiment analysis. Multimodal content is becoming increasingly popular in mainstream online social platforms and can help better extract user sentiments toward a given aspect. There are only few studies focusing on this new task: Multimodal Aspect-based Sentiment Analysis (MASA), which performs aspect-based sentiment analysis by integrating both texts and images. In this paper, we propose a mutimodal interaction model for MASA to learn the relationship among the text, image and aspect via interaction layers and adversarial training. Additionally, we build a new large-scale dataset for this task, named MASAD, which involves seven domains and 57 aspect categories with 38 k image-text pairs. Extensive experiments have been conducted on the proposed dataset to provide several baselines for this task. Though our models obtain significant improvement for this task, empirical results show that MASA is more challenging than textual aspect-based sentiment analysis, which indicates that MASA remains a challenging open problem and requires further efforts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call