Abstract

Face forgery detection (FFD) plays a vital role in maintaining the security and integrity of various information and media systems. Forgery inconsistency caused by manipulation techniques has been proven to be effective for generalizing to the unseen data domain. However, most existing works rely on pixel-level forgery annotations to learn forgery inconsistency. To address the problem, we propose a novel Swin Transformer-based method, AGIL-SwinT, that can effectively learn forgery inconsistency using only video-level labels. Specifically, we first leverage the Swin Transformer to generate the initial mask for the forgery regions. Then, we introduce an attention-guided inconsistency learning module that uses unsupervised learning to learn inconsistency from attention. The learned inconsistency is used to revise the initial mask for enhancing forgery detection. In addition, we introduce a forgery mask refinement module to obtain reliable inconsistency labels for supervising inconsistency learning and ensuring the mask is aligned with the forgery boundaries. We conduct extensive experiments on multiple FFD benchmarks, including intra-dataset, cross-dataset and cross-manipulation testing. The experimental results demonstrate that our method significantly outperforms existing methods and generalizes well to unseen datasets and manipulation categories. Our code is available at https://github.com/woody-xiong/AGIL-SwinT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.