Abstract

To solve the problem of low quality and lack of specific attributes in the text-to-face synthesis task, this paper proposes EFA, a general embedding method for strengthening face attributes in the text-to-image synthesis models. First, we re-encode the irregular word-level descriptions scattered in sentences to form word encoding. Then, we design the embedded local feature extraction layer for discriminators of different models to learn more specific information related to face attributes. Next, we associate the word encoding with the extracted face image feature regions to obtain face attribute domain classification loss of the real image and the generated image. Finally, in the training process, we adopt the loss function to constrain the generator and discriminator to improve their performance. This method can improve the quality of text-to-face synthesis and enhance the semantic correlation between the generated image and text description. A large number of experimental results on the newly released Multi-Modal CelebA-HQ dataset verify the validity of our method, and the experimental results are competitive compared with state of the art. Especially, our approach boosts the FID by 47.75% over AttnGAN, by 33.68% over ControlGAN, by 10.05% over DM-GAN, and by 12.52% over DF-GAN. Code is available at https://github.com/cookie-ke/EFA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.