Abstract

Text-to-image synthesis is a fundamental and challenging task in computer vision, which aims to synthesize realistic images from given descriptions. Recently, text-to-image synthesis methods have achieved great improvements in the quality of synthesized images. However, very few works have explored its application in the scenario of face synthesis, which is of great potentials in face-related applications and the public safety domain. On the other side, the faces generated by existing methods are generally of poor quality and have low consistency to the given text. To tackle this issue, in this paper, we build a novel end-to-end dual-channel generator based generative adversarial network, named DualG-GAN, to improve the quality of the generated images and the consistency to the text description. In DualG-GAN, to improve the consistency between the synthesized image and the input description, a dual-channel generator block is introduced, and a novel loss is designed to improve the similarity between the generated image and the ground-truth in three different semantic levels. Extensive experiments demonstrate that DualG-GAN achieves state-of-the-art results on SCU-Text2face dataset. To further verify the performance of DualG-GAN, we compare it with the current optimal methods on text-to-image synthesis tasks, where quantitative and qualitative results show that the proposed DualG-GAN achieves optimal performance in both Fréchet inception distance (FID) and R-precision metrics. As only a few works are focusing on text-to-face synthesis, this work can be seen as a baseline for future research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call