Abstract
Image emotion recognition aims to automatically categorize the emotion conveyed by an image. The potential of deep representation has been demonstrated in recent research on image emotion recognition. To better understand how CNNs work in emotion recognition, we investigate the deep features by visualizing them in this work. This study shows that the deep models mainly rely on the image content but miss the image style information such as color, texture, and shapes that are low-level visual features but are vital for evoking emotions. To form a more discriminative representation for emotion recognition, we propose a novel CNN model that learns and integrates the content information from the high layers of the deep network with the style information from the lower layers. The uncertainty of image emotion labels is also investigated in this paper. Rather than using the emotion labels for training directly, as in previous work, a new loss function is designed by including the emotion labeling quality to optimize the proposed inference model. Extensive experiments on benchmark datasets are conducted to demonstrate the superiority of the proposed representation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have