Abstract
Most of existing deep learning models for image forgery localization rely on a large number of high-quality labeled samples for training. The training procedures are performed off-line and without adaptation to the image under scrutiny. In this paper, we propose to perform run-time learning of the forgery traces from the suspicious image itself. To this aim, a Variational Auto-Encoder (VAE) model is learned to reconstruct small cliques of the suspicious image, and those cliques with anomalous larger reconstruction errors are therein identified as forged. To further enhance performance, Vision Transform (ViT) is employed as the VAE encoder, and multi-modal input information is explored by considering noise inconsistency, high-pass residual inconsistency, and edge discontinuity. Evaluation on widely used benchmark datasets shows our method outperforms existing blind methods by a large margin, and is competitive against approaches that use ground-truth for supervised training.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have