Abstract

Recent work has shown various interesting semantic image manipulation methods based on GAN guided by text descriptions. A method based on GAN inversion can achieve versatile image manipulation functions without a time-consuming preprocessing stage. However, the method suffers from a lack of self-adaptation due to the intrinsic conflict between multi-objective losses. Meanwhile, the method applied in image manipulation guided by text conditions is not robust due to the vast and ambiguous search space. To solve the above problems, we propose a novel framework RAIN based on GAN inversion, which can achieve robust and adaptive text-driven image manipulation. As shown in Fig. 1(c), RAIN contains two main parts: CEV Initialization and RAGAN inversion. CEV Initialization can adaptively provide a Candidate Editing Vector (CEV) in a short time. RGAN inversion is a multi-stage optimization scheme utilizing the CEV as prior knowledge to prune search space. In RAGAN inversion, we explore how to improve the vision-language model's perception capability to restrict search space further. The objective of the paper is guaranteeing semantic correctness and image quality in a time-constrained scenario compared to the SOTA image manipulation methods guided by text descriptions. Extensive experiments show that RAIN can manipulate images guided by text description while meeting robustness and self-adaptation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call