Abstract

Visual sentiment analysis aims to recognize emotions from visual contents. It is a very useful yet challenging task, especially when fine-grained emotions (such as love, joy, surprise, sadness, fear, anger, disgust, and anxiety) are analyzed. Existing methods based on convolutional neural networks learn sentiment representations based on global visual features, while ignoring the fact that both the local regions of the images and their relationships can have impact on sentiment representation learning. To address this limitation, in this paper, we propose a new MultiAttentive Pyramidal model (MAP) for visual sentiment analysis. The model performs pyramidal segmentation and pooling upon the visual feature blocks obtained from a fully convolutional network, aiming to extract local visual features from multiple local regions at different scales of the global image. It then implants a self-attention mechanism to mine the associations between local visual features, and achieves the final sentiment representation. Extensive experiments on six benchmark datasets show the proposed MAP model outperforms the state-of-the-art methods in visual sentiment analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.