Abstract

Image emotion classification is an important computer vision task to extract emotions from images. The methods for image emotion classification (IEC) are primarily based on label or distribution as a supervision signal, which neither has enough accessibility nor diversity, limiting the development of IEC research. Inspired by psychology research and the recent booming of large-scale pretrained language models. We figure out a language-supervised paradigm, which can cleverly combine the features of language and visual emotion to drive the visual model to gain stronger emotional discernment with language prompts. To practice the paradigm, we present a conceptually simple while empirically powerful framework for image emotion classification, SimEmotion. That we propose a prompt-based fine-tuning strategy to learn task-specific representations by composing a template with the emotion-level concept and entity-level information. Evaluations on four widely-used affective datasets, namely, Flickr and Instagram (FI), EmotionROI, Twitter I, and Twitter II, demonstrate that the proposed algorithm outperforms the state-of-the-art methods with a large margin ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , 8.42% absolute accuracy gain on EmotionROI) on image emotion classification tasks. Our codes will be publicly available for research purposes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call