Text to photo-realistic image synthesis via chained deep recurrent generative adversarial network

Min Wang,Congyan Lang,Songhe Feng,Tao Wang,Yi Jin,Yidong Li

doi:10.1016/j.jvcir.2020.102955

Abstract

Despite the promising progress made in recent years, automatically generating high-resolution realistic images from text descriptions remains a challenging task due to semantic gap between human-written descriptions and diversities of visual appearance. Most existing approaches generate the rough images with the given text descriptions, while the relationship between sentence semantics and visual content is not holistically exploited. In this paper, we propose a novel chained deep recurrent generative adversarial network (CDRGAN) for synthesizing images from text descriptions. Our model uses carefully designed chained deep recurrent generators that simultaneously recovers global image structures and local details. Specially, our method not only considers the logic relationships of image pixels, but also removes computational bottlenecks through parameters sharing. We evaluate our method on three public benchmarks: CUB, Oxford-102 and MS COCO datasets. Experimental results show that our method significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

Full Text