Abstract

Synthesizing realistic fine-grained images from text descriptions is a significant computer vision task. Although many GANs-based methods have been proposed to solve this task, generating high-quality images consistent with text information remains a difficult problem. These existing GANs-based methods ignore important words due to the use of fixed initial word features in generator, and neglect to learn semantic consistency between images and texts for discriminators. In this article, we propose a novel attentional generation and contrastive adversarial framework for fine-grained text-to-image synthesis, termed as Word Self-Update Contrastive Adversarial Networks (WSC-GAN). Specifically, we introduce a dual attention module for modeling color details and semantic information. With a new designed word self-update module, the generator can leverage visually important words to compute attention maps in the feature synthesis module. Furthermore, we contrive multi-branch contrastive discriminators to maintain better consistency between the generated image and text description. Two novel contrastive losses are proposed for our discriminators to impose image-sentence and image-word consistency constraints. Extensive experiments on CUB and MS-COCO datasets demonstrate that our method achieves better performance compared with state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.