Controlling StyleGANs using rough scribbles via one‐shot learning

Yuki Endo,Yoshihiro Kanamori

doi:10.1002/cav.2102

Abstract

AbstractThis paper tackles the challenging problem of one‐shot semantic image synthesis from rough sparse annotations, which we call “semantic scribbles.” Namely, from only a single training pair annotated with semantic scribbles, we generate realistic and diverse images with layout control over, for example, facial part layouts and body poses. We present a training strategy that performs pseudo labeling for semantic scribbles using the StyleGAN prior. Our key idea is to construct a simple mapping between StyleGAN features and each semantic class from a single example of semantic scribbles. With such mappings, we can generate an unlimited number of pseudo semantic scribbles from random noise to train an encoder for controlling a pretrained StyleGAN generator. Even with our rough pseudo semantic scribbles obtained via one‐shot supervision, our method can synthesize high‐quality images thanks to our GAN inversion framework. We further offer optimization‐based postprocessing to refine the pixel alignment of synthesized images. Qualitative and quantitative results on various datasets demonstrate improvement over previous approaches in one‐shot settings.

Full Text