Abstract

With the continuous advancement of deepfake technology, there has been a surge in the creation of realistic fake videos. Unfortunately, the malicious utilization of deepfake poses a significant threat to societal morality and political security. Therefore, numerous researchers have proposed various deepfake detection methods. However, traditional deepfake approaches tend to focus on specific forgery features, such as artifacts or inconsistent actions, which can be vulnerable to specialized countermeasures. Recent studies show an intrinsic correlation between facial and audio cues, which can be exploited for deepfake detection. To address these challenges and enhance the robustness and generalization of deepfake detection algorithms, we propose a novel joint audio-visual deepfake detection model named AVA-CL, which is capable of detecting deepfakes in both audio and visual domains. Furthermore, exploiting the inherent correlation and consistency between audio and visual enhances the effectiveness of deepfake detection significantly. Through extensive experiments, we demonstrate that our proposed AVA-CL model outperforms many state-of-the-art (SOTA) methods with superior robustness and generalization capabilities. This research presents a promising approach for deepfake detection and reducing the harm caused by malicious use.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call