Abstract

Despite the promising progress made in recent years, automatically generating high-resolution realistic images from text descriptions remains a challenging task due to semantic gap between human-written descriptions and diversities of visual appearance. Most existing approaches generate the rough images with the given text descriptions, while the relationship between sentence semantics and visual content is not holistically exploited. In this paper, we propose a novel chained deep recurrent generative adversarial network (CDRGAN) for synthesizing images from text descriptions. Our model uses carefully designed chained deep recurrent generators that simultaneously recovers global image structures and local details. Specially, our method not only considers the logic relationships of image pixels, but also removes computational bottlenecks through parameters sharing. We evaluate our method on three public benchmarks: CUB, Oxford-102 and MS COCO datasets. Experimental results show that our method significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call