Learning semantic alignment from image for text-guided image inpainting

Yucheng Xie,Xingcai Wu,Qing Li,Zehang Lin,Huan Deng,Zhenguo Yang,Xudong Mao,Wenyin Liu

doi:10.1007/s00371-022-02523-0

Abstract

In this paper, we propose a method called LSAI (learning semantic alignment from image) to recover the corrupted image patches for text-guided image inpainting. Firstly, a multimodal preliminary (MP) module is designed to effectively encode global features for images and textual descriptions, where each local image patch and word are taken into account via multi-head self-attention. Secondly, non-Euclidean semantic relations between images and textual descriptions are captured with graph structure by building a semantic relation graph (SRG). The constructed SRG is able to obtain meaningful words describing the image content and alleviate the impact of distracting words, which is achieved by aggregating the semantic relations with graph convolution. In addition, a text-image matching loss is devised to penalize the restored images for diverse textual and visual semantics. Quantitative and qualitative experiments conducted on two public datasets show the outperformance of our proposed LSAI (e.g., FID value is reduced from 30.87 to 16.73 on CUB-200-2011 dataset).

Full Text