Abstract

In recent years, an increasing number of people have indicated their inclination to express their feelings and opinions in the form of text and pictures on social media. Thus, the amount of multimodal data with text and pictures as the main content is increasing. By analyzing the sentiment such multimodal data, people’s attitudes and opinions can be understood. To solve the problem of information redundancy in the multimodal sentiment classification task, first, an image feature extraction model is established based on an attention neural network, which highlights the key areas of the image sentiment information. Second, the tensor product of the text and image mode is used as the joint feature expression of the multimodal data by using the tensor fusion method. After that, a specific information extraction module is designed to extract the fusion features and eliminate the redundant information in the joint features. Results of the experiments performed on two real Twitter image and text datasets demonstrate that the proposed model can outperform the existing models in classifying sentiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call