Abstract

Visual sentiment analysis aims to recognize emotions from visual contents. It is a very useful yet challenging task, especially when fine-grained emotions (such as love, joy, surprise, sadness, fear, anger, disgust, and anxiety) are analyzed. Existing methods based on convolutional neural networks learn sentiment representations based on global visual features, while ignoring the fact that both the local regions of the images and their relationships can have impact on sentiment representation learning. To address this limitation, in this paper, we propose a new MultiAttentive Pyramidal model (MAP) for visual sentiment analysis. The model performs pyramidal segmentation and pooling upon the visual feature blocks obtained from a fully convolutional network, aiming to extract local visual features from multiple local regions at different scales of the global image. It then implants a self-attention mechanism to mine the associations between local visual features, and achieves the final sentiment representation. Extensive experiments on six benchmark datasets show the proposed MAP model outperforms the state-of-the-art methods in visual sentiment analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call