Abstract

Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation. It is therefore highly necessary to develop forensics approaches to distinguish those forged videos from the authentic. Existing works are absorbed in exploring frame-level cues but insufficient in leveraging affluent temporal information. Although some approaches identify forgeries from the perspective of motion inconsistency, there is so far not a promising spatiotemporal feature fusion strategy. Towards this end, we propose the Channel-Wise Spatiotemporal Aggregation (CWSA) module to fuse deep features of continuous video frames without any recurrent units. Our approach starts by cropping the face region with some background remained, which transforms the learning objective from manipulations to the difference between pristine and manipulated pixels. A deep convolutional neural network (CNN) with skip connections that are conducive to the preservation of detection-helpful low-level features is then utilized to extract frame-level features. The CWSA module finally makes the real or fake decision by aggregating deep features of the frame sequence. Evaluation against a list of large facial video manipulation benchmarks has illustrated its effectiveness. On all three datasets, FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the state-of-the-art methods with significant advantages.

Highlights

  • Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation

  • FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the stateof-the-art methods with significant advantages

  • According to the clues used, the detection approaches of face video manipulation can be mainly divided into two: intraframe information based and interframe information based. e former focuses on spatial artifacts and realizes video manipulation detection by processing independent frames. e latter captures the dynamic flaws in videos through temporal models like Recurrent Neural Network (RNN) [3] or optical flow [4]

Read more

Summary

Related Work

With the help of the Nonnegative Matrix Factorization model and histograms of Discrete Cosine Transform, multiple JPEG compression can be successfully detected and indirectly, the authenticity of images Another kind of popular approach is to discover clues that are related to the camera itself. Most dynamic artifacts based detection approaches utilize a CNN backbone to firstly extract features of every single frame. By modeling the face and head movements as the unique speaking pattern of a specific individual, the high prediction error can be a strong hint of fake Biological signals such as eye blinking and pulse are discriminating cues to expose DeepFakes. E proposed CWSA module recombines the feature maps into a new feature sequence which is compressed to a vector and connected to a single neural unit for real or fake classification. A single neural with sigmoid activation is connected to it and makes the classification fake or real. e pipeline of the proposed CWSA is summarized in Algorithm 1

Findings
Experimental Settings
Conclusion and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.