Image manipulation detection and localization using multi-scale contrastive learning

Ruyi Bai

doi:10.1016/j.asoc.2024.111914

Abstract

Current image tampering detection methods rely on various forgery footprints, such as JPEG artifacts and edge inconsistencies, and use algorithms related to image segmentation. However, these methods have several issues, including over-fitting, focusing on only a few specific forgery footprints, and emphasizing semantically relevant information while ignoring tampering traces. This paper proposes a model for image manipulation detection and localization based on multi-scale contrast learning(MSCL-Net). The model utilizes the differences in feature distributions between tampered and untampered regions to extract a comprehensive tamper trace. It uses a dual-stream structured encoder that incorporates both RGB raw images and SRM noise features. A Feature Cross-Fusion Module (FCFM) is proposed to fuse features for improving feature representation of tampered information. The decoding process involves the use of an Adaptive Self-Attention Module (ASAM) to filter and aggregate relevant context from coarse feature maps. Additionally, a Supervised Contrastive Learning Module (SCLM) is used to expand the difference between tampered and untampered areas. The loss function for multi-loss fusion comprises classification loss, segmentation loss, and multi-scale supervised contrastive loss. This improves the network's understanding of global differences, reduces false positives, weakens semantic information, and enhances the model's ability to locate tampered regions of varying sizes. Extensive experiments are conducted across multiple datasets, demonstrating that our model is robust against attacks and resilient to false-positive predictions at both the image-level and pixel-level. Furthermore, its overall performance exceeds that of state-of-the-art alternatives in reliably detecting and localizing tampered images.

Full Text