Abstract

As a sub-topic of Text-to-Image synthesis, Text-to-Face generation has a great potential in face related applications. In this paper, we propose a generic Text-to-Face framework, namely TextFace, to achieve diverse and high-quality face image generation from text description. We introduce a novel method called Text-to-Style mapping, where the text description can be directly encoded into the latent space of a pretrained StyleGAN. Guided by our text-image similarity matching and face captioning based text alignment, the textual latent code can be fed into a well-trained StyleGAN's generator, to produce diverse face images with high resolution (1024 1024). Furthermore, our model inherently supports the semantic face editing using text descriptions. Finally, experimental results quantitatively and qualitatively demonstrate the superior performance of our model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call