Abstract
Fine-grained meme understanding aims to explore and comprehend the meanings of memes from multiple perspectives by performing various tasks, such as sentiment analysis, intention detection, and offensiveness detection. Existing approaches primarily focus on simple multi-modality fusion and individual task analysis. However, there remain several limitations that need to be addressed: (1) the neglect of incongruous features within and across modalities, and (2) the lack of consideration for correlations among different tasks. To this end, we leverage metaphorical information as text modality and propose a Metaphor-aware Multi-modal Multi-task Framework (M3F) for fine-grained meme understanding. Specifically, we create inter-modality attention enlightened by the Transformer to capture inter-modality interaction between text and image. Moreover, intra-modality attention is applied to model the contradiction between the text and metaphorical information. To learn the implicit interaction among different tasks, we introduce a multi-interactive decoder that exploits gating networks to establish the relationship between various subtasks. Experimental results on the MET-Meme dataset show that the proposed framework outperforms the state-of-the-art baselines in fine-grained meme understanding.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.