Deepfake technology, which encompasses various video manipulation techniques implemented through deep learning algorithms-such as face swapping and expression alteration-has advanced to generate fake videos that are increasingly difficult for human observers to detect, posing significant threats to societal security. Existing methods for detecting deepfake videos aim to identify such manipulated content to effectively prevent the spread of misinformation. However, these methods often suffer from limited generalization capabilities, exhibiting poor performance when detecting fake videos outside of their training datasets. Moreover, research on the precise localization of manipulated regions within deepfake videos is limited, primarily due to the absence of datasets with fine-grained annotations that specify which regions have been manipulated.To address these challenges, this paper proposes a novel spatial-based training method that does not require fake samples to detect spatial manipulations in deepfake videos. By employing a technique that combines multi-part local displacement deformation and fusion, we generate more diverse deepfake feature data, enhancing the detection accuracy of specific manipulation methods while producing mixed-region labels to guide manipulation localization. We utilize the Swin-Unet model for manipulation localization detection, incorporating classification loss functions, local difference loss functions, and manipulation localization loss functions to effectively improve the precision of localization and detection.Experimental results demonstrate that the proposed spatial-based training method without fake samples effectively simulates the features present in real datasets. Our method achieves satisfactory detection accuracy on datasets such as FF++, Celeb-DF, and DFDC, while accurately localizing the manipulated regions. These findings indicate the effectiveness of the proposed self-blending method and model in deepfake video detection and manipulation localization.
Read full abstract