Abstract
DeepFake face forgery has a serious negative impact on both society and individuals. Therefore, research on DeepFake detection technologies is necessary. At present, DeepFake detection technology based on deep learning has achieved acceptable results on high-quality datasets; however, its detection performance on low-quality datasets and cross-datasets remains poor. To address this problem, this paper presents a multi-scale interactive dual-stream network (MSIDSnet). The network is divided into spatial- and frequency-domain streams and uses a multi-scale fusion module to capture both the facial features of images that have been manipulated in the spatial domain under different circumstances and the fine-grained high-frequency noise information of forged images. The network fully integrates the features of the spatial- and frequency-domain streams through an interactive dual-stream module and uses vision transformer (ViT) to further learn the global information of the forged facial features for classification. Experimental results confirm that the accuracy of this method reached 99.30 % on the high-quality dataset Celeb-DF-v2, and 95.51 % on the low-quality dataset FaceForensics++. Moreover, the results of the cross-dataset experiments were superior to those of the other comparison methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Visual Communication and Image Representation
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.