In order to achieve interaction between structure and texture information in generative adversarial image inpainting networks and improve the semantic veracity of the restored images, unlike the original two-stage inpainting ideas where texture and structure are restored separately, this paper constructs a multi-scale fusion approach to image generation, which embeds images into two collaborative subtasks, that is, structure generation and texture synthesis under structural constraints. We also introduce a self-attention mechanism into the partial convolution of the encoder to enhance the long range contextual information acquisition of the model in image inpainting, and design a multi-scale fusion network to fuse the generated structure and texture feature, so that the structure and texture information can be reused for reconstruction, perception and style loss compensation, thus enabling the fused images to achieve global consistency. In the training phase, feature matching loss are introduced to enhance the image in terms of structural generation plausibility. Finally, through comparison experiments with other inpainting networks on the CelebA, Paris StreetView and Places2 datasets, it is demonstrated that our method constructed in this paper has better objective evaluation metrics, more effective inpainting of structural and texture information of corrupted images and better image inpainting performance.