ReverseGAN: An intelligent reverse generative adversarial networks system for complex image captioning generation

Guoxiang Tong,Wei Shao,Yueyang Li

doi:10.1016/j.displa.2024.102653

Abstract

Towards the inclusion of complex semantic relational images, we propose an intelligent Reverse Generative Adversarial Network (ReverseGAN) with generative task guidance to build an image caption system. The system utilizes regenerated images to learn the concept of image caption generation, using a generative adversarial network as the overall framework of the model. The generative network uses a graph convolutional neural network to encode the images and constructs a decoder model that converts image vectors into captions. The reverse text-to-image task serves as a discriminator model. The discriminator uses a text embedding module to map the text descriptions generated by the generator to local word-level features and global sentence features. In addition, the introduced cascading attention module utilizes the generated embedding vectors to generate images on a coarse-to-fine scale, incorporating both global and local features of the text to ensure that the generated images are closer to the original images. Our model outperforms current state-of-the-art methods in BLEU, METEOR, and ROUGE metrics when experimented on the MSCOCO dataset.

Full Text