Abstract

With the speedy development of communication Internet and the widespread use of social multimedia, so many creators have published posts on social multimedia platforms that fake news detection has already been a challenging task. Although some works use deep learning methods to capture visual and textual information of posts, most existing methods cannot explicitly model the binary relations among image regions or text tokens to mine the global relation information in a modality deeply such as image or text. Moreover, they cannot fully exploit the supplementary cross-modal information, including image and text relations, to supplement and enrich each modality. In order to address these problems, in this paper, we propose an innovative end-to-end Cross-modal Relation-aware Networks (CRAN), which exploits jointly models the visual and textual information with their corresponding relations in a unified framework. (1) To capture the global structural relations in a modality, we design a global relation-aware network to explicitly model the relation-aware semantics of the fragment features in the target modality from a global scope perspective. (2) To effectively fuse cross-modal information, we propose a cross-modal co-attention network module for multi-modal information fusion, which utilizes the intra-modality relationships and inter-modality relationship jointly among image regions and textual words to replenish and heighten each other. Extensive experiments on two public real-world datasets demonstrate the superior performance of CRAN compared with other state-of-the-art baseline algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call