Abstract
Sarcasm is a form of sentiment expression that highlights the disparity between a person’s true intentions and the content they explicitly present. With the exponential increase in multimodal data on social platforms, the detection of sarcasm across various modes has become a pivotal area of research. Although previous studies have extensively examined multimodal feature extraction, fusion, and the modeling of inter-modal incongruities, they often neglected the subtle sentiment cues inherent in sarcastic multimodal data. Additionally, they did not adequately address the sparse distribution and tenuous connections between sarcastic features both within and cross modalities. To address these gaps, we introduce a hierarchical fusion model that integrates sentiment information for enhanced multimodal sarcasm detection. Specifically, we use attribute-object matching in the image modality, treating it as an auxiliary attribute modality. Sentiment data is then extracted from each modality and combined to achieve a more comprehensive representation within modalities. Moreover, we characterize the relationships of inter-modal incongruities using a crossmodal Transformer. We also implement a sentiment-aware image-text contrastive loss mechanism to synchronize the semantics of images and text better. By intensifying these alignments, our model is better equipped to understand incongruous relationships. Experiments demonstrate that our hierarchical fusion model achieves state-of-the-art performance on the multimodal sarcasm detection task.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.