Abstract

Visual sentiment analysis (VSA) is a challenging task which attracts wide attention from researchers for its great application potentials. Existing works for VSA mostly extract global representations of images for sentiment prediction, ignoring the different contributions of local regions. Some recent studies analyze local regions separately and achieve improvements on the sentiment prediction performance. However, most of them treat regions equally in the feature fusion process which ignores their distinct contributions or use a global attention map whose performance is easily influenced by noises from non-emotional regions. In this paper, to solve these problems, we propose an end-to-end deep framework to effectively exploit the contributions of local regions to VSA. Specifically, a Sentiment Region Attention (SRA) module is proposed to estimate contributions of local regions with respect to the global image sentiment. Features of these regions are then reweighed and further fused according to their estimated contributions. Moreover, since the image sentiment is usually closely related to humans appearing in the image, we also propose to model the contribution of human faces as a special local region for sentiment prediction. Experimental results on publicly available and widely used datasets for VSA demonstrate our method outperforms state-of-the-art algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.