Voice-Face Homogeneity Tells Deepfake

Harry Cheng,Tianyi Wang,Xiaojun Chang,Liqiang Nie,Yangyang Guo,Qi Li

doi:10.1145/3625231

Abstract

Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of these approaches has reached a blockage. In this article, we propose to perform deepfake detection from an unexplored voice-face matching view. Our approach is founded on two supporting points: first, there is a high degree of homogeneity between the voice and face of an individual (i.e., they are highly correlated), and second, deepfake videos often involve mismatched identities between the voice and face due to face-swapping techniques. To this end, we develop a voice-face matching method that measures the matching degree between these two modalities to identify deepfake videos. Nevertheless, training on specific deepfake datasets makes the model overfit certain traits of deepfake algorithms. We instead advocate a method that quickly adapts to untapped forgery, with a pre-training then fine-tuning paradigm. Specifically, we first pre-train the model on a generic audio-visual dataset, followed by the fine-tuning on downstream deepfake data. We conduct extensive experiments over three widely exploited deepfake datasets: DFDC, FakeAVCeleb, and DeepfakeTIMIT. Our method obtains significant performance gains as compared to other state-of-the-art competitors. For instance, our method outperforms the baselines by nearly 2%, achieving an AUC of 86.11% on FakeAVCeleb. It is also worth noting that our method already achieves competitive results when fine-tuned on limited deepfake data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Voice-Face Homogeneity Tells Deepfake

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Nov 10, 2023
Citations: 15

Similar Papers

Deepfake Detection with Clustering-based Embedding Regularization
Kui Zhu ... Bai Wang
-
Kui Zhu, et. al.Kui Zhu ... Bai Wang
01 Jul 2020
01 Jul 2020

Video Normalization in Identifying Fake Videos Using a Long Short-Term Memory Model
Kirtan Thakkar ... Dan Lo
-
Kirtan Thakkar, et. al.Kirtan Thakkar ... Dan Lo
01 Apr 2023
01 Apr 2023

Toward the Creation and Obstruction of DeepFakes
Yuezun Li ... Siwei Lyu
-
Yuezun Li, et. al.Yuezun Li ... Siwei Lyu
01 Jan 2021
01 Jan 2021

Conspiracy thinking and social media use are associated with ability to detect deepfakes
Ewout Nas ... Roy De Kleijn
Telematics and Informatics | VOL. 87
Ewout Nas, et. al.Ewout Nas ... Roy De Kleijn
27 Dec 2023
Telematics and Informatics | VOL. 87

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Voice-Face Homogeneity Tells Deepfake

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications