Abstract

Visual sentiment analysis intends to understand the sentiment evoked by images, which is an important yet challenging field. Existing methods focus on learning single-scale sentiment features from the whole image. However, it has been proved that image emotions are evoked by the sentiment-specific regions of the image, which are related to high-level to low-level sentiment features. To exploit multi-scale sentiment features from both holistic and localized information, we propose a novel end-to-end multi-task framework for joint sentiment- specific regions detection and sentiment classification. Our method contains three components: Multi-Scale Features Extractor, Sentiment-Specific Regions Detection Branch and Sentiment Classification Branch. In the Multi-Scale Features Extractor, by fusing multi-scale features extracted from convolutional neural networks by means of Feature Pyramid Network (FPN) and Adaptively Spatial Feature Fusion (ASFF), the proposed approach first generates a holistic feature carrying more sentiment-related information. Then, the semantic map is automatically discovered from the enhanced holistic feature by adopting attention mechanism in the Sentiment- Specific Regions Detection Branch, which does not require any manual annotations. Finally, the localized and holistic information are adaptively integrated for final sentiment classification in the Sentiment Classification Branch. Extensive experiments on public benchmark datasets demonstrate the robustness and effectiveness of our method in visual sentiment analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call