Image inpainting is a significant task in the applications of computer vision, that aims to fill in damaged regions with visually realistic contents. With the development of deep learning, generative adversarial network (GAN)-based image inpainting approaches have achieved remarkable progress. However, these methods only utilize one-sided structure information to assist inpainting, which can not achieve satisfactory results, especially when synthesizing large-area missing complex images. In order to tackle this problem, a wavelet-based self-attention GAN (WSA-GAN) with collaborative feature fusion is proposed, which is embedded with a wavelet-based self-attention (WSA) and a collaborative feature fusion (CFF). The WSA is designed to conduct long-range dependence among multi-scale frequency information to highlight significant structure details for better generating texture boundaries. The CFF is presented to couple channel-guided space and space-affected channel streams to facilitate the interaction of spatial and channel features, which can effectively avoid potential domain conflicts. Besides, a novel wavelet consistency loss and a hierarchical pyramid feature matching (PFM) discriminator are introduced to stabilize the model training. Extensive experiments on three public datasets, including Paris StreetView, CelebA-HQ and Places, demonstrate that the proposed method outperforms the state-of-the-art methods both quantitatively and qualitatively.
Read full abstract