Abstract

Public concerns about deepfake face forgery are continually rising in recent years. Most deepfake detection approaches attempt to learn discriminative features between real and fake faces through end-to-end trained deep neural networks. However, the majorities of them suffer from the problem of poor generalization across different data sources, forgery methods, and/or post-processing operations. In this paper, following the simple but effective principle in discriminative representation learning, i.e., towards learning features of intra-consistency within classes and inter-diversity between classes, we leverage a novel transformer-based self-supervised learning method and an effective data augmentation strategy towards generalizable deepfake detection. Considering the differences between the real and fake images are often subtle and local, the proposed method firstly utilizes Self Prediction Learning (SPL) to learn rich hidden representations by predicting masked patches at a pre-training stage. Intra-class consistency clues in images can be mined without deepfake labels. After pre-training, the discrimination model is then fine-tuned via multi-task learning, including a deepfake classification task and a forgery mask estimation task. It is facilitated by our new data augmentation method called Adjustable Forgery Synthesizer (AFS), which can conveniently simulate the process of synthesizing deepfake images with various levels of visual reality in an explicit manner. AFS greatly prevents overfitting due to insufficient diversity in training data. Comprehensive experiments demonstrate that our method outperforms the state-of-the-art competitors on several popular benchmark datasets in terms of generalization to unseen forgery methods and untrained datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call