Abstract

In recent years, more and more people are keen to express their feelings and opinions in the form of pictures and texts on social media at the same time, which makes the multimodal data with pictures and texts as the main content growing. Compared with monomodal data, multimodal data contains more information and can reveal the real feelings of users. The analysis of the emotion of these massive multimodal data is helpful to better understand people’s attitudes and viewpoints and has a wide range of application scenarios. In order to solve the problem of information redundancy in multimodal emotion classification tasks, based on the tensor fusion scheme, a multimodal emotion analysis method based on attention neural network is proposed. This method constructs a text feature extraction model and an image feature extraction model based on attention neural network, which highlights the key areas of image emotional information and the words containing emotional information, which makes the expression of single-modal features more concise and accurate. The tensor product of each mode is regarded as the joint feature expression of multimodal data, and the redundant information of joint features is eliminated by principal component analysis, and then the emotion category of multimodal data is obtained by support vector machine. The proposed model is evaluated on two real Twitter image data sets. The experimental results show that, compared with other emotion classification models, this method has a great improvement in classification accuracy, recall rate, F1 index and accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call