Lip Sync Matters: A Novel Multimodal Forgery Detector

Sahibzada Adil Shahzad,Yu Tsao,Sarwar Khan,Yan-Tsung Peng,Hsin-Min Wang,Ammarah Hashmi

doi:10.23919/apsipaasc55919.2022.9980296

Sahibzada Adil Shahzad, Yu Tsao + Show 4 more

https://doi.org/10.23919/apsipaasc55919.2022.9980296

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Deepfake technology has advanced a lot, but it is a double-sided sword for the community. One can use it for beneficial purposes, such as restoring vintage content in old movies, or for nefarious purposes, such as creating fake footage to manipulate the public and distribute non-consensual pornography. A lot of work has been done to combat its improper use by detecting fake footage with good performance thanks to the availability of numerous public datasets and unimodal deep learning-based models. However, these methods are insufficient to detect multimodal manipulations, such as both visual and acoustic. This work proposes a novel lip-reading-based multi-modal Deepfake detection method called “Lip Sync Matters.” It targets high-level semantic features to exploit the mismatch between the lip sequence extracted from the video and the synthetic lip sequence generated from the audio by the Wav2lip model to detect forged videos. Experimental results show that the proposed method outperforms several existing unimodal, ensemble, and multimodal methods on the publicly available multimodal FakeAVCeleb dataset.

Full Text