Abstract

Visual emotion analysis(VEA) aiming to detect the emotions behind images, has gained increasing attention with the development of online social media. Recent studies in prompt learning have significantly advanced visual emotion classification. However, these methods usually utilize random vectors or non-emotional texts as the initialization for prompt optimization. This restricts the emotional semantic representation of prompts and hinders the performance of the model. To tackle this problem, we leverage emotional prompts with multiple views to enhance the semantic emotional information. We first translate the image to caption as context prompt(COP) from the view of background information for the image. Additionally, we introduce hybrid emotion prompt(HEP) from the view of the interaction between the emotional visual and textual information, where different modalities are integrated with a novel Emotion Joint Congruity Learning module. Furthermore, we also provide label prompt(LP) to enhance the emotional association with labels, enabling better emotional information fusion. Extensive experiments conducted on five publicly visual emotion classification datasets, i.e. EmoSet, FI, have demonstrated the superiority of our MVP model over cutting-edge methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call