Abstract

Generating a series of images to describe a story of multiple sentences is a challenging task in computer vision as it needs to consider both the image-level precision and story-level consistency. Existing methods usually focus on the consistency between images at the cost of the fine-grained object details. We propose a progressive adversarial learning algorithm, termed HR-PrGAN, to achieve high-resolution image sequences with rich details by decomposing the problem of generating into multiple stages. Specifically, HR-PrGAN has two stages, where the Coarse-grain Stage generates a series of coherent coarse-grained images from both the story and context embeddings, and an additional unconditional loss is proposed to restrict their deformation and preserve the object contours and layouts. Subsequently, the Refinement Stage further refines the series of coherent coarse-grained images by injecting the story-level text embeddings and preserving the image-level details via a Coarse-grained feature Supplementary Module (CSM). Moreover, two commonly-used datasets, i.e. the CLEVR-SV and PororoSV datasets, are applied to evaluate the proposed method. Extensive experiments demonstrate that the proposed model significantly outperforms state-of-the-art methods in terms of image anti-deformation, fine-grained feature synthesis and human perception based image quality evaluation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.