Abstract

The widespread dissemination of facial forgery technology has brought many ethical issues and aroused widespread concern in society. Most research today treats deepfake detection as a fine grained classification task, which however makes it difficult to enable the feature extractor to express the features related to the real and fake attributes. This paper proposes a depth map guided triplet network, which mainly consists of a depth prediction network and a triplet feature extraction network. The depth map predicted by the depth prediction network can effectively reflect the differences between real and fake faces in discontinuity, inconsistent illumination, and blurring, thus in favor of deepfake detection. Regardless of the facial appearance changes induced by deepfake, we argue that real and fake faces should correspond to their respective latent feature spaces. Particularly, the pair of real faces (original–target) remain close in the latent feature space, while the two pairs of real–fake faces (original–fake, target–fake) instead keep faraway. Following this paradigm, we suggest a triplet loss supervision network to extract the sufficiently discriminative deep features, which minimizes the distance of the original–target pair and maximize the distance of the original–fake (also target–fake) pair. The extensive results on public FaceForensics++ and Celeb-DF datasets validate the superiority of our method over competitors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call