Abstract
Recently, many methods that exploit attention mechanism to discover the relevant local regions via visual attributes, have demonstrated promising performance in visual sentiment prediction. In these methods, accurate detection of visual attributes is of vital importance to identify the sentiment relevant regions, which is crucial for successful assessment of visual sentiment. However, existing work merely utilize basic strategies on convolutional neural network for visual attribute detection and fail to obtain satisfactory results due to the semantic gap between visual features and subjective attributes. Moreover, it is difficult for existing attention models to localize subtle sentiment relevant regions, especially when the performance of attribute detection is relatively poor. To address these problems, we first design a multi-task learning based approach for visual attribute detection. By augmenting the attributes with sentiments supervision, the semantic gap can be effectively reduced. We then develop a multi-attention model for jointly discovering and localizing multiple relevant local regions given predicted attributes. The classifier built on top of these regions achieves a significant improvement in visual sentiment prediction. Experimental results demonstrate the superiority of our method against previous approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.