Text to Image Synthesis With Bidirectional Generative Adversarial Network

Zixu Wang,Xinjian Hu,Zhe Quan,Zhi-Jie Wang,Yangyang Chen

doi:10.1109/icme46284.2020.9102904

Abstract

Generating realistic images from text descriptions is a challenging problem in computer vision. Although previous works have shown remarkable progress, guaranteeing semantic consistency between text descriptions and images remains challenging. To generate semantically consistent images, we propose two semantics-enhanced modules and a novel Textual-Visual Bidirectional Generative Adversarial Network (TVBi-GAN). Specifically, this paper proposes a semanticsenhanced attention module and a semantics-enhanced batch normalization module. These modules improve consistency of synthesized images by involving precisely semantic features. What's more, an encoder network is proposed to extract semantic features from images. During the adversarial process, the encoder could guide our generator to explore corresponding features behind descriptions. With extensive experiments on CUB and COCO datasets, we demonstrate that our TVBi-GAN outperforms state-of-the-art methods.

Full Text