Unmasking Deception: Empowering Deepfake Detection with Vision Transformer Network

Muhammad Asad Arshed,Amgad Muneer,Rao Faizan Ali,Ayed Alwadain,Muhammad Ibrahim,Shahzad Mumtaz

doi:10.3390/math11173710

Abstract

With the development of image-generating technologies, significant progress has been made in the field of facial manipulation techniques. These techniques allow people to easily modify media information, such as videos and images, by substituting the identity or facial expression of one person with the face of another. This has significantly increased the availability and accessibility of such tools and manipulated content termed ‘deepfakes’. Developing an accurate method for detecting fake images needs time to prevent their misuse and manipulation. This paper examines the capabilities of the Vision Transformer (ViT), i.e., extracting global features to detect deepfake images effectively. After conducting comprehensive experiments, our method demonstrates a high level of effectiveness, achieving a detection accuracy, precision, recall, and F1 rate of 99.5 to 100% for both the original and mixture data set. According to our existing understanding, this study is a research endeavor incorporating real-world applications, specifically examining Snapchat-filtered images.

Full Text