Abstract

Images can convey rich semantic information and arouse strong emotions in the viewer. With the growing trend of online images and videos to express opinions, evaluating emotions from visual content has attracted considerable attention. Image emotion recognition aims to classify the emotions conveyed by images automatically. The existing image sentiment classification studies using manual features or deep models mainly focus on low-level visual features or high-level semantic representation without considering all factors. In this paper, we adopt visualization to study the working principle of deep representation in emotion recognition. Research shows that the deep model mainly relies on deep semantic information while ignoring the features of shallow visual details, which are essential to evoke emotions. To form a more discriminative representation for emotion recognition, we propose a multi-level representation model with side branches that learns and integrates different depth representations of the backbone for sentiment analysis. Unlike the hierarchy CNN structure, our model provides a description from the deep semantic representation to shallow visual representation. Additionally, several feature fusion approaches are analyzed and discussed to optimize the deep model. Extensive experiments on several image emotion recognition datasets show that our model outperforms various existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call