Abstract
Sarcasm is a meaningful and effective form of expression which people often use to express sentiments that are contrary to their literal meaning. It is fairly common to encounter such expressions on social media platforms. Comparing with the traditional approach of text sarcasm detection, multi-modal sarcasm detection is proved to be more effective when dealing with information on social networks with various forms of communication. In this work, a prompt-tuning method is proposed for multi-modal sarcasm detection (Pmt-MmSD). Specifically, to model the incongruity of text modalities, we first build a prompt-PLM network. Second, to model the text-image incongruity, an inter-modality attention network (ImAN) is designed based on self-attention mechanism. In addition, we utilize the pre-trained Vision Transformer (ViT) network to process the image modality. Extensive experiments demonstrated the effectiveness of the proposed Pmt-MmSD model for multi-modal sarcasm detection, which significantly outperforms the state-of-the-art results.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have