High-Resolution Realistic Image Synthesis from Text Using Iterative Generative Adversarial Network

Anwar Ullah,Xinguo Yu,Hafiz Ur Rahman,Abdul Majid,M Farhan Mughal

doi:10.1007/978-3-030-34879-3_17

Abstract

Synthesizing high-resolution realistic images from text description using one iteration Generative Adversarial Network (GAN) is difficult without using any additional techniques because mostly the blurry artifacts and mode collapse problems are occurring. To reduce these problems, this paper proposes an Iterative Generative Adversarial Network (iGAN) which takes three iterations to synthesize high-resolution realistic image from their text description. In the \(1^{st}\) iteration, GAN synthesizes a low-resolution \(64 \times 64\) pixels basic shape and basic color image from the text description with less mode collapse and blurry artifacts problems. In the \(2^{nd}\) iteration, GAN takes the result of the \(1^{st}\) iteration and text description again and synthesizes a better resolution \(128 \times 128\) pixels better shape and well color image with very less mode collapse and blurry artifacts problems. In the last iteration, GAN takes the result of the \(2^{nd}\) iteration and text description as well and synthesizes a high-resolution \(256 \times 256\) well shape and clear image with almost no mode collapse and blurry artifacts problems. Our proposed iGAN shows a significant performance on CUB birds and Oxford-102 flowers datasets. Moreover, iGAN improves the inception score and human rank as compare to the other state-of-the-art methods.

Full Text