Abstract

Advancements in computer vision and deep learning have made it difficult to distinguish deepfake visual media. While existing detection frameworks have achieved significant performance on challenging deepfake datasets, these approaches consider only a single perspective. More importantly, in urban scenes, neither complex scenarios can be covered by a single view nor can the correlation between multiple datasets of information be well utilized. In this article, to mine the new view for deepfake detection and utilize the correlation of multi-view information contained in images, we propose a novel triple complementary streams detector (TCSD). First, a novel depth estimator is designed to extract depth information (DI), which has not been used in previous methods. Then, to supplement depth information for obtaining comprehensive forgery clues, we consider the incoherence between image foreground and background information (FBI) and the inconsistency between local and global information (LGI). In addition, we designed an attention-based multi-scale feature extraction (MsFE) module to extract more complementary features from DI, FBI, and LGI. Finally, two attention-based feature fusion modules are proposed to adaptively fuse information. Extensive experiment results show that the proposed approach achieves state-of-the-art performance on detecting deepfakes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call