We propose a novel method that trains a conditional Generative Adversarial Network (GAN) to generate visual explanations of a Convolutional Neural Network (CNN). Specifically, a conditional GAN (cGAN) is trained with information on how the CNN processes an image when making predictions. The approach poses two main challenges, namely how to represent this information in a form feedable to a cGAN and how to effectively use the representation to train the explanation model. To tackle these challenges, we devised an appropriate representation of CNN architectures by cumulatively averaging intermediate grad-CAM interpretation maps. In the model, Spatial Feature Transform (SFT) layers are used to feed the CNN representations to the GAN. Experimental results show that our approach learned the general aspects of CNNs and was agnostic to datasets and CNN architectures. The study includes both qualitative and quantitative evaluations and compares the proposed GANs with state-of-the-art approaches. We found that both the initial and final layers of CNNs are equally crucial for explaining CNNs. We believe training a GAN to explain CNNs would open doors for improved interpretations by leveraging fast-paced deep learning advancements. The code used for experimentation is publicly available at https://github.com/Akash-guna/Explain-CNN-With-GANS