Text to Video GANs:TFGAN, IRC-GAN, BoGAN

Rayeesa Mehmood,Rumaan Bashir,Kaiser J Giri

doi:10.1109/icaccs54159.2022.9785103

Rayeesa Mehmood, Rumaan Bashir + Show 1 more

https://doi.org/10.1109/icaccs54159.2022.9785103

Copy DOI

Export

Save

Cite

Publication Date: Mar 25, 2022

Abstract
Full-Text
Similar Papers

Abstract

Listen

Generative adversarial networks (GANs) have demonstrated high accuracy on image generation tasks. A large number of studies have applied image generating models to video generation as well. However, because of the complexities of video generation, it's not that trivial to use GANs in the video domain. In the video generation, the resulting content has to be spatially and temporally coherent. Moreover, generating videos from text is even more challenging since besides maintaining the temporal and spatial coherence, semantic consistency also needs to be maintained. In this paper, we have compared three recently proposed text-to-video GAN architectures. Text-Filter Conditioning Generative Adversarial Network (TFGAN) is the first architecture, which employs a superior feature fusion method in which firstly the discriminative convolutional filters are produced from text features and then convolved with image features in the discriminator. The Introspective Recurrent Convolutional GAN (IRC-GAN) is the second architecture, which leverages mutual-information introspection to maintain semantic consistency between the generated videos and the input text. The third model is Bottom-up GAN (BoGAN) which introduces three levels of losses viz, region level loss, frame-level loss, and video level loss.

Full Text