Visual Enhancement Capsule Network for Aspect-based Multimodal Sentiment Analysis

Yifei Zhang,Daling Wang,Zhiqing Zhang,Shi Feng

doi:10.3390/app122312146

Yifei Zhang, Daling Wang + Show 2 more

Open Access

PDF Available

https://doi.org/10.3390/app122312146

Copy DOI

Export

Save

Cite

Journal: Applied Sciences	Publication Date: Nov 28, 2022
License type: CC BY 4.0

Affiliation: Northeastern University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Multimodal sentiment analysis, which aims to recognize the emotions expressed in multimodal data, has attracted extensive attention in both academia and industry. However, most of the current studies on user-generated reviews classify the overall sentiments of reviews and hardly consider the aspects of user expression. In addition, user-generated reviews on social media are usually dominated by short texts expressing opinions, sometimes attached with images to complement or enhance the emotion. Based on this observation, we propose a visual enhancement capsule network (VECapsNet) based on multimodal fusion for the task of aspect-based sentiment analysis. Firstly, an adaptive mask memory capsule network is designed to extract the local clustering information from opinion text. Then, an aspect-guided visual attention mechanism is constructed to obtain the image information related to the aspect phrases. Finally, a multimodal fusion module based on interactive learning is presented for multimodal sentiment classification, which takes the aspect phrases as the query vectors to continuously capture the multimodal features correlated to the affective entities in multi-round iterative learning. Otherwise, due to the limited number of multimodal aspect-based sentiment review datasets at present, we build a large-scale multimodal aspect-based sentiment dataset of Chinese restaurant reviews, called MTCom. The extensive experiments both on the single-modal and multimodal datasets demonstrate that our model can better capture the local aspect-based sentiment features and is more applicable for general multimodal user reviews than existing methods. The experimental results verify the effectiveness of our proposed VECapsNet.

Full Text