Abstract

The rapid development of digital image inpainting technology is causing serious hidden danger to the security of multimedia information. In this paper, a deep network called frequency attention-based dual-stream network (FADS-Net) is proposed for locating the inpainting region. FADS-Net is established by a dual-stream encoder and an attention-based blue-associative decoder. The dual-stream encoder includes two feature extraction streams, the raw input stream (RIS) and the frequency recalibration stream (FRS). RIS directly captures feature maps from the raw input, while FRS performs feature extraction after recalibrating the input via learning in the frequency domain. In addition, a module based on dense connection is designed to ensure efficient extraction and full fusion of dual-stream features. The attention-based associative decoder consists of a main decoder and two branch decoders. The main decoder performs up-sampling and fine-tuning of fused features by using attention mechanisms and skip connections, and ultimately generates the predicted mask for the inpainted image. Then, two branch decoders are utilized to further supervise the training of two feature streams, ensuring that they both work effectively. A joint loss function is designed to supervise the training of the entire network and two feature extraction streams for ensuring optimal forensic performance. Extensive experimental results demonstrate that the proposed FADS-Net achieves superior localization accuracy and robustness on multiple datasets compared to the state-of-the-art inpainting forensics methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call