One of the primary applications of recent conditional generative models is the generation of images from natural language. In addition to the testing ability to these model conditional, extremely dimensional distributions, text-to-image synthesis has several exciting and sensible applications such as photo editing or computer-aided creation of content. Using Generative Adversarial Networks (GANs), recent progress has been made. This project begins with a gentle introduction to these topics and the current state of the art models are discussed. Moreover, here propose a distinct deep architecture and GAN formulation to efficiently bridge the progress of text and image modelling, visual thoughts from characters to pixels. It demonstrates the capability of this model to generate plausible images of birds and flowers from detailed text descriptions. In addition, Wasserstein GAN-CLS is proposing a new model for conditional image generation based on the distance from Wasserstein offering stability guarantees. Then shows how the new loss function of Wasserstein GAN-CLS can be used by a Conditional Progressive Growing GAN. The model, combined with the proposed loss, boosts the model’s best Inception Score (on the Caltech bird’s dataset) by 7.07 %, which uses only the visual semantics at the sentence level. Only the model that performs better than the Conditional Wasserstein Progressive GAN is the newly proposed AttnGAN, which also uses visual semiconductors at the word level.