Current deepfake detection methods are prone to suffer performance drops when confronted with unseen forgeries with uncertain domain shifts across datasets or fewer manipulated artifacts. To tackle this issue, in this paper, a novel Domain Shift Modeling (DSM) framework based on the vision transformer backbone is proposed, where Attention-Guided Patch Masking (AGPM) and Feature Statistic Shift Estimation (FSSE) modules are developed to model the domain shifts and facilitate the network to alleviate the domain perturbations and achieve better generalization against potential domain shifts. Specifically, DSM begins with selecting the top-K contribution patches with AGPM for the final classification based on attention weights and randomly masks part of them to simulate hard samples with fewer manipulated artifacts in the cross-dataset. Then, the FSSE module estimates the distributions of the feature statistics, e.g., mean and standard deviation, and randomly resamples feature statistics from the estimated distributions to simulate different data distributions and various forgery features, which are vital for generalization. Equipped with the two components, the networks can be trained to alleviate the domain shifts and show better generalization ability. Extensive experiments on several public datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access