Disentanglement of Latent Factors of Real and Fake Appearance for Deepfake Face Manipulation Detection

Suh-Yoon Hong,Geunjung Yi,Dayul Park

doi:10.47611/jsrhs.v12i1.4076

Abstract

A deepfake video is a video in which generative models are used to alter the facial features to make the subject appear to be a different person. There are various ways to utilize such content, including those that are positive such as entertainment. However, it is also very easy to exploit deepfake videos for harmful use, including for spreading fake news or creating unwanted content. Thus there have been numerous attempts to detect whether a video has been manipulated using deepfake technology so as to prevent further harm. Previous approaches for achieving this purpose have attempted to detect discrepancies in the video frames through the use of techniques such as exploiting the temporal consistency between each frame with convolutional neural networks. Though this has produced adequate results, its accuracy is insufficient for real-world use. In this paper, we propose a novel method of using a convolutional neural network based autoencoder to detect whether a video is pristine or deepfake. Our method successfully disentangles latent factors of real and fake appearance to increase the classification accuracy while maintaining a relatively low time complexity, enhancing real-world applicability. Results from extensive experimentation show significant improvement from state-of-the-art-methods by upwards of 18.51%.

Full Text