HR-PrGAN: High-resolution story visualization with progressive generative adversarial networks

Pei Dong,Lei Wu,Lei Meng,Xiangxu Meng

doi:10.1016/j.ins.2022.10.083

Abstract

Generating a series of images to describe a story of multiple sentences is a challenging task in computer vision as it needs to consider both the image-level precision and story-level consistency. Existing methods usually focus on the consistency between images at the cost of the fine-grained object details. We propose a progressive adversarial learning algorithm, termed HR-PrGAN, to achieve high-resolution image sequences with rich details by decomposing the problem of generating into multiple stages. Specifically, HR-PrGAN has two stages, where the Coarse-grain Stage generates a series of coherent coarse-grained images from both the story and context embeddings, and an additional unconditional loss is proposed to restrict their deformation and preserve the object contours and layouts. Subsequently, the Refinement Stage further refines the series of coherent coarse-grained images by injecting the story-level text embeddings and preserving the image-level details via a Coarse-grained feature Supplementary Module (CSM). Moreover, two commonly-used datasets, i.e. the CLEVR-SV and PororoSV datasets, are applied to evaluate the proposed method. Extensive experiments demonstrate that the proposed model significantly outperforms state-of-the-art methods in terms of image anti-deformation, fine-grained feature synthesis and human perception based image quality evaluation.

Full Text