Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics

Yujiang Lu,Yaju Liu,Jianwei Fei,Zhihua Xia

doi:10.1155/2021/5524930

Abstract

Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation. It is therefore highly necessary to develop forensics approaches to distinguish those forged videos from the authentic. Existing works are absorbed in exploring frame-level cues but insufficient in leveraging affluent temporal information. Although some approaches identify forgeries from the perspective of motion inconsistency, there is so far not a promising spatiotemporal feature fusion strategy. Towards this end, we propose the Channel-Wise Spatiotemporal Aggregation (CWSA) module to fuse deep features of continuous video frames without any recurrent units. Our approach starts by cropping the face region with some background remained, which transforms the learning objective from manipulations to the difference between pristine and manipulated pixels. A deep convolutional neural network (CNN) with skip connections that are conducive to the preservation of detection-helpful low-level features is then utilized to extract frame-level features. The CWSA module finally makes the real or fake decision by aggregating deep features of the frame sequence. Evaluation against a list of large facial video manipulation benchmarks has illustrated its effectiveness. On all three datasets, FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the state-of-the-art methods with significant advantages.

Highlights

Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation
FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the stateof-the-art methods with significant advantages
According to the clues used, the detection approaches of face video manipulation can be mainly divided into two: intraframe information based and interframe information based. e former focuses on spatial artifacts and realizes video manipulation detection by processing independent frames. e latter captures the dynamic flaws in videos through temporal models like Recurrent Neural Network (RNN) [3] or optical flow [4]

Summary

Related Work

With the help of the Nonnegative Matrix Factorization model and histograms of Discrete Cosine Transform, multiple JPEG compression can be successfully detected and indirectly, the authenticity of images Another kind of popular approach is to discover clues that are related to the camera itself. Most dynamic artifacts based detection approaches utilize a CNN backbone to firstly extract features of every single frame. By modeling the face and head movements as the unique speaking pattern of a specific individual, the high prediction error can be a strong hint of fake Biological signals such as eye blinking and pulse are discriminating cues to expose DeepFakes. E proposed CWSA module recombines the feature maps into a new feature sequence which is compressed to a vector and connected to a single neural unit for real or fake classification. A single neural with sigmoid activation is connected to it and makes the classification fake or real. e pipeline of the proposed CWSA is summarized in Algorithm 1

Findings

Experimental Settings

Conclusion and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Security and Communication Networks	Publication Date: Aug 27, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security and Communication Networks

Lead the way for us

Similar Papers

Predicting Salient Face in Multiple-Face Videos
Yufan Liu ... Songyang Zhang
-
Yufan Liu, et. al.Yufan Liu ... Songyang Zhang
01 Jul 2017
01 Jul 2017

Research on epileptic EEG recognition based on improved residual networks of 1-D CNN and indRNN
Mengnan Ma ... Xiaoyan Wei
BMC Medical Informatics and Decision Making | VOL. 21
Mengnan Ma, et. al.Mengnan Ma ... Xiaoyan Wei
01 Jul 2021
BMC Medical Informatics and Decision Making | VOL. 21

Forged facial video detection framework based on multi-region temporal relationship feature
Xing Fang ... Nan Xu
AIP Advances | VOL. 13
Xing Fang, et. al.Xing Fang ... Nan Xu
01 Aug 2023
AIP Advances | VOL. 13

Clinically Relevant Vulnerabilities of Deep Machine Learning Systems for Skin Cancer Diagnosis
Xinyi Du-Harpur ... Magnus D Lynch
Journal of Investigative Dermatology | VOL. 141
Xinyi Du-Harpur, et. al.Xinyi Du-Harpur ... Magnus D Lynch
12 Sep 2020
Journal of Investigative Dermatology | VOL. 141

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security and Communication Networks