Unsupervised Learning-Based Framework for Deepfake Video Detection

Li Zhang,Shichuang Xie,Ning Zheng,Tong Qiao,Ming Xu

doi:10.1109/tmm.2022.3182509

Abstract

With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.

Full Text