Abstract

The main challenge of text-to-image (Txt2Img) synthesis is how to enrich the image details while preserving the semantic consistency between text descriptions and generated images. To tackle this challenge, we propose a dual attention-guided state controller (DASC) mechanism for Txt2Img synthesis in this paper. Different from the conventional approaches that use an average pooling for extracting global semantic information, the proposed approach computes a new word-to-visual (W2V) attention together with the conventional visual-to-word (V2W) attention to form the dual attention. It extracts local semantic information, which is more relevant to each word at each image generation stage, in a recurrent manner by controlling the subsequent image generation states. Furthermore, guided by the local semantic information extracted in dual attention, a state controller is proposed to perform dynamic importance boosting for the mismatched words, and control the states of image generation by refining the generated image with rich details. Experiments are carried out to demonstrate the superior performance of the proposed approach in Txt2Img synthesis on the benchmark CUB and MS-COCO datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.