When dealing with images containing large hole-missing regions, deep learning-based image inpainting algorithms often face challenges such as local structural distortions and blurriness. In this paper, a novel hierarchical decoding network for image inpainting is proposed. Firstly, the structural priors extracted from the encoding layer are utilized to guide the first decoding layer, while residual blocks are employed to extract deep-level image features. Secondly, multiple hierarchical decoding layers progressively fill in the missing regions from top to bottom, then interlayer features and gradient priors are used to guide information transfer between layers. Furthermore, a proposed Multi-dimensional Efficient Attention is introduced for feature fusion, enabling more effective extraction of image features across different dimensions compared to conventional methods. Finally, Efficient Context Fusion combines the reconstructed feature maps from different decoding layers into the image space, preserving the semantic integrity of the output image. Experiments have been conducted to validate the effectiveness of the proposed method, demonstrating superior performance in both subjective and objective evaluations. When inpainting images with missing regions ranging from 50% to 60%, the proposed method achieves improvements of 0.02 dB (0.22 dB) and 0.001 (0.003) in PSNR and SSIM, on the CelebA-HQ (Places2) dataset, respectively.
Read full abstract