RIFoL: A robust image forgery localization network for noisy images.
RIFoL: A robust image forgery localization network for noisy images.
- Research Article
4
- 10.1117/1.jei.31.6.063051
- Dec 21, 2022
- Journal of Electronic Imaging
With the development of technology, it is becoming easier and easier for people to generate tampered images that are indistinguishable from the human eye, and the malicious use of these photos in news, academic papers, and criminal crimes has brought great harm to society. This paper proposes a deep convolutional neural network-based image forgery localization method to uncover the subtle differences between doctored images and real images. Specifically, the network achieves tampering localization through a three-stage enhancement scheme. First, the dilated convolution in the deep layer of the network is used to keep the feature map resolution constant, and the number of shallow convolutions is decreased to reduce the perceptual field, so the network focuses on local regions. Second, the feature enhancement module is used to fuse shallow features with deep features to effectively filter content information and highlight tampering features, making full use of local and global information to improve the generalization ability. Finally, the attention enhancement module reweights the convolutional feature maps in terms of channels and locations, respectively, to highlight the information regions around the forgery boundaries, thus guiding the network to capture more intrinsic features for image forgery. Extensive experimental results on several public datasets show that this method outperforms other state-of-the-art methods in image forgery localization.
- Research Article
- 10.1109/tpami.2026.3656742
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
Recent advances in deep learning have significantly propelled the development of image forgery localization. However, existing models remain highly vulnerable to adversarial attacks: imperceptible noise added to forged images can severely mislead these models. In this paper, we address this challenge with an Adversarial Noise Suppression Module (ANSM) that generates a defensive perturbation to suppress the attack effect of adversarial noise. We observe that forgery-relevant features extracted from adversarial and original forged images exhibit distinct distributions. To bridge this gap, we introduce Forgery-relevant Features Alignment (FFA) as a first-stage training strategy, which reduces distributional discrepancies by minimizing the channel-wise Kullback-Leibler divergence between these features. To further refine the defensive perturbation, we design a second-stage training strategy, termed Mask-guided Refinement (MgR), which incorporates a dual-mask constraint. MgR ensures that the defensive perturbation remains effective for both adversarial and original forged images, recovering forgery localization accuracy to their original level. Extensive experiments across various attack algorithms demonstrate that our method significantly restores the forgery localization model's performance on adversarial images. Notably, when ANSM is applied to original forged images, the performance remains nearly unaffected. To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks. We have released the source code and anti-forensics dataset.
- Conference Article
7
- 10.2991/meita-15.2015.126
- Jan 1, 2015
Aiming at the problem of image noise level estimation, this paper proposes an algorithm for noise estimation by singular value decomposition and neural network. The larger (head) parts of the singular values of an image are mainly affected by main structure of the image, and the rest (tail) parts of the singular values are affected by the intensity of noise. With the increase of noise level, corresponding tail parts of singular values are increased. So, singular values should be good characteristics for noise intensity estimation. Firstly, we add different noise with known intensity on a batch of noise free images, and then select a certain number of fixed size image blocks which standard deviation are minimum from these noisy images. Then singular values of these blocks were fed as the input of the neural network, their corresponding noise standard deviation as the output to train neural network. Finally, in the estimation phase, singular values of noise image were used fed into the trained network to predict the unknown noise intensity. The experimental results show that proposed algorithm is quite promising. It can estimates different types of noise with fast speed and high precise, including Gauss white noise and Hybrid noise.
- Research Article
4
- 10.3390/e27050535
- May 17, 2025
- Entropy (Basel, Switzerland)
Image forgery localization is critical in defending against the malicious manipulation of image content, and is attracting increasing attention worldwide. In this paper, we propose a Dual-domain Fusion Swin Transformer U-Net (DFST-UNet) for image forgery localization. DFST-UNet is built on a U-shaped encoder-decoder architecture. Swin Transformer blocks are integrated into the U-Net architecture to capture long-range context information and perceive forged regions at different scales. Considering the fact that high-frequency forgery information is an essential clue for forgery localization, a dual-stream encoder is proposed to comprehensively expose forgery clues in both the RGB domain and the frequency domain. A novel high-frequency feature extractor module (HFEM) is designed to extract robust high-frequency features. A hierarchical attention fusion module (HAFM) is designed to effectively fuse the dual-domain features. Extensive experimental results demonstrate the superiority of DFST-UNet over the state-of-the-art methods in the task of image forgery localization.
- Research Article
7
- 10.1109/tifs.2024.3381470
- Jan 1, 2024
- IEEE Transactions on Information Forensics and Security
The widespread misuse of advanced image editing tools and deep generative techniques has led to a proliferation of images with altered content in real-life scenarios, often without any discernible traces of tampering. This has created a potential threat to security and credibility of images. Image forgery localization is an urgent technique. In this paper, we propose a novel reinforcement learning-based framework CoDE (Construct Decision-making Environment) that can provide reliable localization result of tampered area in forged images. We model the forgery localization task as a Markov Decision Process (MDP), where each pixel is equipped with an agent that performs Gaussian distribution-based continuous action to iteratively update the respective forgery probability, so as to achieve pixel-level image forgery localization. In order to construct the state transitions within MDP, we propose a twin-flow state encoder to handle the updated state, which consists of the forged image and its corresponding forgery probability map. What’s more, considering that the tampered area is often sparse in practical image tampering scenarios, we design a reward function specifically for these sparse tampered area. This reward function can guide the agent to more effectively learn the optimal strategy for maximizing the cumulative reward. Extensive experiments conducted on a variety of benchmark datasets demonstrate CoDE’s superior localization accuracy and robustness against image degradation caused by transmission through Online Social Networks (OSNs) and various post-processing attacks.
- Conference Article
53
- 10.1109/wacv56688.2023.00462
- Jan 1, 2023
Conventional forgery localizing methods usually rely on different forgery footprints such as JPEG artifacts, edge inconsistency, camera noise, etc., with cross-entropy loss to locate manipulated regions. However, these methods have the disadvantage of over-fitting and focusing on only a few specific forgery footprints. On the other hand, real-life manipulated images are generated via a wide variety of forgery operations and thus, leave behind a wide variety of forgery footprints. Therefore, we need a more general approach for image forgery localization that can work well on a variety of forgery conditions. A key assumption in underlying forged region localization is that there remains a difference of feature distribution between untampered and manipulated regions in each forged image sample, irrespective of the forgery type. In this paper, we aim to leverage this difference of feature distribution to aid in image forgery localization. Specifically, we use contrastive loss to learn mapping into a feature space where the features between un-tampered and manipulated regions are well-separated for each image. Also, our method has the advantage of localizing manipulated region without requiring any prior knowledge or assumption about the forgery type. We demonstrate that our work outperforms several existing methods on three benchmark image manipulation datasets. Code is available at https://github.com/niloy193/CFLNet
- Conference Article
35
- 10.1109/iccv48922.2021.01476
- Oct 1, 2021
With wide applications of image editing tools, forged images (splicing, copy-move, removal and etc.) have been becoming great public concerns. Although existing image forgery localization methods could achieve fairly good results on several public datasets, most of them perform poorly when the forged images are JPEG compressed as they are usually done in social networks. To tackle this issue, in this paper, a self-supervised domain adaptation network, which is composed of a backbone network with Siamese architecture and a compression approximation network (ComNet), is proposed for JPEG-resistant image forgery localization. To improve the performance against JPEG compression, ComNet is customized to approximate the JPEG compression operation through self-supervised learning, generating JPEG-agent images with general JPEG compression characteristics. The backbone network is then trained with domain adaptation strategy to localize the tampering boundary and region, and alleviate the domain shift between uncompressed and JPEG-agent images. Extensive experimental results on several public datasets show that the proposed method outperforms or rivals to other state-of-the-art methods in image forgery localization, especially for JPEG compression with unknown QFs.
- Research Article
2
- 10.1109/tdsc.2024.3522190
- May 1, 2025
- IEEE Transactions on Dependable and Secure Computing
Deep Learning image forgery localization methods have achieved remarkable results but cannot maintain comparable performance when the forgery images are JPEG compressed, a format that is widely used in daily information transmission. The robustness against JPEG compression has become a bottleneck to the practical application of image forgery localization. To address this issue, a robust image forgery localization framework is proposed against the performance degradation caused by JPEG compression. Specifically, a cutting-edge progressive disentanglement strategy is proposed that incorporates coarse-grained image disentanglement to mitigate the detrimental effects of general JPEG compression, while harnessing the ability of fine-grained element disentanglement to separate multi-scale artifacts, thereby minimizing interference from content information. Moreover, the decision strategy is carefully designed to reinforce subtle signals from tampered areas, including artifacts fusion block reasoning multi-scale artifacts and dual attention block that learn more about forgery-related features. Extensive visualizations and experiments demonstrate that our method can achieve competitive performance in general JPEG-resistant image forgery localization, especially in the performance of generalization experiments.
- Research Article
4
- 10.1007/s11063-025-11774-6
- Jul 9, 2025
- Neural Processing Letters
In the field of semantic segmentation, the limited receptive field of convolutional neural networks leads to insufficient extraction of global features, thereby affecting the accuracy of network segmentation. To address this issue, a Hierarchical Hybrid Encoder Network (HHEnet) based on Transformers is proposed for semantic segmentation. Firstly, to solve the problem of limited global feature information caused by the network’s limited receptive field, a Hierarchical Hybrid Encoder (HHE) is introduced, which consists of a Hierarchical Convolutional Encoder (HCE) and a Hierarchical Transformer Encoder (HTE). The encoder combines the advantages of convolution and transformers, allowing for effective extraction of both shallow and deep features. In order to further enhance spatial and global semantic information, the Feature Enhancement Module (FEM) was introduced, which consisted of two feature enhancement modules: spatial feature enhancement module (SEM) and global feature enhancement module (GEM), which enhanced spatial detail information and global semantic information respectively. Thus the accuracy of semantic segmentation can be improved. Finally, to alleviate the discrepancy between the features of the convolutional encoder and the transformer encoder, a Feature Guidance Module (FGM) is introduced. Experimental results conducted on Cityscapes, ADE20K and PASCAL VOC2012 datasets achieved mIoU scores of up to 81.9%, 49.4% and 79.1%, respectively. Compared to state-of-the-art networks, the research results confirm the higher segmentation accuracy of the proposed HHEnet in this study.
- Conference Article
2
- 10.1109/ctmcd53128.2021.00042
- Apr 1, 2021
Until now, image forgery has caused great harm in many aspects, such as certificate falsification, fake news, Internet rumors. As a result, the algorithm for image forensics is of great importance. Existing image forensics algorithms are mature in image forgery detection. Nevertheless, there is still room for improvement when it comes to image forgery localization. This paper modified and migrated U <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net to image forensics, and conducted several experiments to demonstrate the effectiveness of U <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net in contrast to ManTra-Net. The experimental results demonstrate that U <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net is not only capable of image forgery detection but also image forgery localization and that in some cases U <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net is even more powerful than ManTra-Net in image forgery localization.
- Research Article
- 10.1007/s44196-025-00892-7
- Jul 8, 2025
- International Journal of Computational Intelligence Systems
With the advancements in multimedia software and hardware technology, image forgery localization has become an important challenge in digital forensics. To improve the efficiency and stability of image forgery detection, we propose a mixed-domain perception and multi-expert decoding recognition model. First, we design an alignment strategy that utilizes both RGB and frequency domain information of images. This strategy adapts to the multi-dimensional distribution characteristics of the original data, enhancing the discrimination of tampered regions. Next, we employ a hybrid expert modeling approach to improve the model’s robustness in the representation space through feature selection and recombination. Additionally, we introduce a region-weighted contrastive learning method to better localize and focus on tampered regions. Experiments on four datasets (CASIA, NIST, COVERAGE, and IMD) show that our proposed model achieves an improvement in AUC ranging from 0.15 to 1.9% compared to the existing advanced methods. These results indicate that our approach contributes to more accurate image forgery localization, offering potential benefits for digital forensics and multimedia security applications.
- Conference Article
7
- 10.1109/csnt54456.2022.9787662
- Apr 23, 2022
Due to the broad accessibility of camera systems, photographing has grown in popularity. Photos seem to be significant in our daily lives since they carry a plethora of data, and it is frequently necessary to improve photographs to acquire greater information. Although many technologies are available to enhance the images, these are also often utilized to fabricate photos, leading to the spreading of disinformation. Image forgeries are becoming a serious topic of concern. To locate image forgeries, several conventional frameworks have been developed in the past. Convolutional neural networks (CNNs) have garnered much popularity in recent years, and CNN’s have influenced image forgery localization. One of the most difficult images forgery types is image splicing, in which a part of an image is copied into another image. Image forgery localization techniques that exist in the literature have some limitations. Hence, it is essential to develop a technique for effectively and accurately locating forgeries in the tampered images. We present a strong deep learning-based approach for detecting forgery in an image by using image patches. A patch is taken around it to classify a pixel in an image, which is passed to a CNN to predict whether the pixel belongs to the tampered region. The proposed method efficiently predicts the boundary pixels of the tampered region and the background image. The technique has been rigorously evaluated, and the experiment results obtained are extremely encouraging on CASIA 2.0 database.
- Research Article
- 10.3390/math13142285
- Jul 16, 2025
- Mathematics
Most image forgery localization methods rely on supervised learning, requiring large labeled datasets for training. Recently, several unsupervised approaches based on the variational autoencoder (VAE) framework have been proposed for forged pixel detection. In these approaches, the latent space is built by a simple Gaussian distribution or a Gaussian Mixture Model. Despite their success, there are still some limitations: (1) A simple Gaussian distribution assumption in the latent space constrains performance due to the diverse distribution of forged images. (2) Gaussian Mixture Models (GMMs) introduce non-convex log-sum-exp functions in the Kullback–Leibler (KL) divergence term, leading to gradient instability and convergence issues during training. (3) Estimating GMM mixing coefficients typically involves either the expectation-maximization (EM) algorithm before VAE training or a multilayer perceptron (MLP), both of which increase computational complexity. To address these limitations, we propose the Deep ViT-VAE-GMM (DVVG) framework. First, we employ Jensen’s inequality to simplify the KL divergence computation, reducing gradient instability and improving training stability. Second, we introduce convolutional neural networks (CNNs) to adaptively estimate the mixing coefficients, enabling an end-to-end architecture while significantly lowering computational costs. Experimental results on benchmark datasets demonstrate that DVVG not only enhances VAE performance but also improves efficiency in modeling complex latent distributions. Our method effectively balances performance and computational feasibility, making it a practical solution for real-world image forgery localization.
- Conference Article
7
- 10.1109/icaee47123.2019.9015093
- Nov 1, 2019
In this paper, we aimed to filter radiographic weld images to facilitate weld defects detection and to improve the automatic industrial inspection. The noisy images were contaminated by three types of noise: the multiplicative speckle noise, the additive Gaussian white noise, and the mixed noise combining the two kinds of noise. Wavelet-based filters and anisotropic diffusion techniques have proven their worth in reducing both Gaussian additive noise and speckle noise. We presented in this work a filtering algorithm based on diffusion in the wavelet packet domain to enhance the quality of the noisy weld images. Comparing the performance of this approach to other wavelet based methods, experiments proved the wavelet packet diffusion’s effectiveness in reducing noise and preserving defects details.
- Research Article
3
- 10.3390/electronics13193919
- Oct 3, 2024
- Electronics
While most current image forgery localization (IFL) deep learning models focus primarily on the foreground of tampered images, they often neglect the essential complementary background semantic information. This oversight tends to create significant gaps in these models’ ability to thoroughly interpret and understand a tampered image, thereby limiting their effectiveness in extracting critical tampering traces. Given the above, this paper presents a novel contrastive learning and edge-reconstruction-driven complementary learning network (CECL-Net) for image forgery localization. CECL-Net enhances the understanding of tampered images by employing a complementary learning strategy that leverages foreground and background features, where a unique edge extractor (EE) generates precise edge artifacts, and edge-guided feature reconstruction (EGFR) utilizes the edge artifacts to reconstruct a fully complementary set of foreground and background features. To carry out the complementary learning process more efficiently, we also introduce a pixel-wise contrastive supervision (PCS) method that attracts consistent regions in features while repelling different regions. Moreover, we propose a dense fusion (DF) strategy that utilizes multi-scale and mutual attention mechanisms to extract more discriminative features and improve the representational power of CECL-Net. Experiments conducted on two benchmark datasets, one Artificial Intelligence (AI)-manipulated dataset and two real challenge datasets, indicate that our CECL-Net outperforms seven state-of-the-art models on three evaluation metrics.