Abstract
The current social media is flooded with hyper realistic face-synthetic videos due to the explosion of DeepFake technology that has brought a serious impact on human society security, which calls for further exploring on deepfake video detection methods. Existing methods attempt to isolated capture spatial artifacts or extract the homogeneous temporal inconsistency to detect deepfake video, but little attention has been paid to the exploitation of dynamic spatial-temporal inconsistency. To mitigate this issue, in this paper, we propose a novel Multi-Rate Excitation Network (MRE-Net) to effectively excite dynamic spatial-temporal inconsistency from the perspective of multiple rates for deepfake video detection. The proposed MRE-Net is composed of two components: Bipartite Group Sampling (BGS) and multiple rate branches. The BGS draws the entire video into multiple bipartite groups with different rates to cover various face motion dynamic evolution. We further design multiple rate branches to capture both short-term and long-term spatial-temporal inconsistency from corresponding bipartite groups of BGS. Concretely, for the early stages of the multi-rate branches, Momentary Inconsistency Excitation (MIE) module is developed to encode the spatial artifacts and intra-group short-term temporal inconsistency. Meanwhile, for the last stages of the multi-rate branches, Longstanding Inconsistency Excitation (LIE) module is constructed to perceive inter-group long-term temporal dynamics. Extensive experiments and visualizations conducted on four popular datasets demonstrate the effectiveness of the proposed method against state-of-the-art deepfake detection methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have