Abstract

An increasing number of fake news combining text, images and other forms of multimedia are spreading rapidly across social platforms, leading to misinformation and negative impacts. Therefore, the automatic identification of multimodal fake news has become an important research hotspot in academia and industry. The key to multimedia fake news detection is to accurately extract features of both text and visual information, as well as to mine the correlation between them. However, most of the existing methods merely fuse the features of different modal information without fully extracting intra- and inter-modal connections and complementary information. In this work, we learn physical tampered cues for images in the frequency domain to supplement information in the image space domain, and propose a novel multimodal frequency-aware cross-attention network (MFCAN) that fuses the representations of text and image by jointly modelling intra- and inter-modal relationships between text and visual information whin a unified deep framework. In addition, we devise a new cross-modal fusion block based on the cross-attention mechanism that can leverage inter-modal relationships as well as intra-modal relationships to complement and enhance the features matching of text and image for fake news detection. We evaluated our approach on two publicly available datasets and the experimental results show that our proposed model outperforms existing baseline methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call