Abstract

Text-to-image generation can be widely applied in various fields, such as scene retrieval and computer-aided design. The existing approaches can generate realistic images from simple text descriptions, whereas rendering images from complex text descriptions is still not satisfactory for practical applications. To generate accurate high-resolution images from given complex texts, we proposed an attention-enhancing adversarial learning network (Attn-Eh ALN) based upon conditional generative adversarial networks and the attention mechanism. This model consists of an encoding module and a generative module. In the encoding module, we proposed a local attention driven encoding network that allows words in the text with different weights to enhance the semantic representation of specific object features. The attention mechanism is employed to capture more details while ensuring global information. This enables the details in the generated images to be more fine-grained. In the discriminating stage, we take multiple discriminators to distinguish the realness of the generated images, avoiding the bias caused by a single discriminator. Moreover, a semantic similarity judgment module is introduced to improve the semantic consistency between the text description and visual content. Experimental results on benchmark datasets indicate that Attn-Eh ALN generates favorable results in comparison with other state-of-the-art methods from qualitative and quantitative assessments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.