Deepfake technique can synthesize realistic images, audios, and videos, facilitating the thriving of entertainment, education, healthcare, and other industries. However, its abuse may pose potential threats to personal privacy, social stability, and even national security. Therefore, the development of deepfake detection methods is attracting more and more attention. Existing works mainly focus on the detection of common videos for entertainment purposes. In contrast, fake videos maliciously synthesized for Person of Interest (PoI, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e</i> ., who is in an authoritative position and has broadly public influences) are much more harmful to society because of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">celebrity endorsement</i> . However, there is no particular benchmark for driving related research in the community. Motivated by this observation, we present the first large-scale benchmark dataset, named <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FakePoI</i> , to enable the research on fake PoI detection. It contains numerous fake videos of important people from all walks of life, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g</i> ., police chiefs, city mayors, famous artists, and well-known Internet bloggers. In summary, our <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FakePoI</i> includes 11,092 synthesized videos where only a few clips rather than the entire are fake. Previous fake detection algorithms deteriorate heavily or even fail on our <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FakePoI</i> due to two main challenges. On the one hand, the rich diversity of our fake videos makes it pretty difficult to find universally applicable patterns for detection. On the other hand, the high credibility contributed by the presence of real frames easily confuses a common detector. To tackle these challenges, we present an amplifier framework, highlighting the feature gap between real and generated video frames. Specifically, we present a quadruplet loss to narrow the distance of all real PoIs and meanwhile push away each real and fake PoI in embedding space. We implement our framework and conduct extensive experiments on the proposed benchmark. The quantitative results demonstrate that our approach outperforms existing methods significantly, setting a strong baseline on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FakePoI</i> . The qualitative analysis also shows its superiority. We will release our dataset and code at https://github.com/cslltian/deepfake-detection to encourage future research on this valuable area.
Read full abstract