Abstract

Text-to-image synthesis is to generate images with the consistent content as the given text description, which is a highly challenging task with two main issues: visual reality and content consistency. Recently, it is available to generate images with high visual reality due to the significant progress of generative adversarial networks. However, translating text description to image with high content consistency is still ambitious. For addressing the above issues, it is reasonable to establish a transitional space with interpretable representation as a bridge to associate text and image. So we propose a text-to-image synthesis approach named Bridge-like Generative Adversarial Networks (Bridge-GAN). Its main contributions are: (1) A transitional space is established as a bridge for improving content consistency, where the interpretable representation can be learned by guaranteeing the key visual information from given text descriptions. (2) A ternary mutual information objective is designed for optimizing the transitional space and enhancing both the visual reality and content consistency. It is proposed under the goal to disentangle the latent factors conditioned on text description for further interpretable representation learning. Comprehensive experiments on two widely-used datasets verify the effectiveness of our Bridge-GAN with the best performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.