Abstract

To solve the problems of blur, artifacts, and semantic inaccuracy of existing deep learning-based image inpainting algorithm when the large-area irregular defect area images are repaired, combining the U-NET architecture and the idea of a generative adversarial network, a generative image inpainting network based on the attention transfer cross layer mechanism is proposed. The network is divided into two parts: generator and discriminator; the generator proposes a attention transfer network cross layer based on the U-NET architecture; after the input image is encoded, the attention transfer network cross layer is used to reconstruct the encoding feature map of each layer. The decoding is performed by fusing the skip connection with the corresponding potential feature in the multi-scale decoder, solving the information loss between the codec network layers. Moreover, each decoding layer of the multi-scale decoder is prompted to generate consistent content, and finally the repaired image is obtained. After the decoding is completed, the decoding feature map of each layer is converted into a multi-scale RGB image, and the reconstruction L1 loss is applied on each scale; by minimizing the adversarial loss, the discriminator can determine the authenticity of the repaired image; the perceptual loss and style loss are introduced to judge the output image, so as to restrict the generator to generate more realistic and reasonable content. A qualitative and quantitative comparative analysis is performed on the Celeba, Façade, and Places2 with the current mainstream image inpainting algorithms. The results show that the repair results in this paper are better than others in terms of mean L1 loss, PSNR, and SSIM, and more in line with human visual characteristics. The attention transfer network cross layer proposed in this paper can effectively reconstruct the more detailed coding feature map, and it can play an active guiding role in the coding process by using skip connection; the image inpainting network can generate content that is highly consistent with the structure and semantics of the real image when repairing large-area irregular defect areas, which is more in line with the human visual experience.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call