Abstract

In recent years, with the popularity of social media, users are increasingly keen to express their feelings and opinions in the form of pictures and text, which makes multimodal data with text and pictures the con tent type with the most growth. Most of the information posted by users on social media has obvious sentimental aspects, and multimodal sentiment analysis has become an important research field. Previous studies on multimodal sentiment analysis have primarily focused on extracting text and image features separately and then combining them for sentiment classification. These studies often ignore the interaction between text and images. Therefore, this paper proposes a new multimodal sentiment analysis model. The model first eliminates noise interference in textual data and extracts more important image features. Then, in the feature-fusion part based on the attention mechanism, the text and images learn the internal features from each other through symmetry. Then the fusion features are applied to sentiment classification tasks. The experimental results on two common multimodal sentiment datasets demonstrate the effectiveness of the proposed model.

Highlights

  • With the increasing popularity of social media, people are increasingly keen to express their views or opinions on social media platforms

  • To solve the above problems, this paper proposes a multimodal sentiment classification model based on a fine-grained attention mechanism

  • We propose a new multimodal cross-feature fusion model based on the attention mechanism (CFF-ATT), which can effectively fuse the features of different modes and provide more effective and accurate information for sentiment classification

Read more

Summary

Introduction

With the increasing popularity of social media, people are increasingly keen to express their views or opinions on social media platforms. A large volume of data is in the form of text and image combinations, which constitute a huge volume of multimodal data. In the graphic data of social media, text and images contain sentimental information, which is different and complementary to each other. Compared with the single-mode data of text or an image, multimodal data contains more information and can better reveal the real feelings of users. Sentimental information in different modal data is different, so the sentimental feature representation of the modal data must be obtained effectively for sentiment analysis. Not all image areas in a picture are related to sentimental expression, and not all words in text data are related to sentiment. Different modal data must express the underlying features of different dimensions and attributes. The sentiment analysis of single-mode texts primarily used traditional statistical methods, which are highly dependent on the quality of Symmetry 2020, 12, 2010; doi:10.3390/sym12122010 www.mdpi.com/journal/symmetry

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call