Generative adversarial network for semi-supervised image captioning

Xu Liang,Chen Li,Lihua Tian

doi:10.1016/j.cviu.2024.104199

Abstract

Traditional supervised image captioning methods usually rely on a large number of images and paired captions for training. However, the creation of such datasets necessitates considerable temporal and human resources. Therefore, we propose a new semi-supervised image captioning algorithm to solve this problem. The proposed method uses a generative adversarial network to generate images that match captions, and uses these generated images and captions as new training data. This avoids the error accumulation problem when generating pseudo captions with autoregressive method and the network can directly perform backpropagation. At the same time, in order to ensure the correlation between the generated images and captions, we introduced the CLIP model for constraints. The CLIP model has been pre-trained on a large amount of image–text data, so it shows excellent performance in semantic alignment of images and text. To verify the effectiveness of our method, we validate on MSCOCO offline “Karpathy” test split. Experiment results show that our method can significantly improve the performance of the model when using 1% paired data, with the CIDEr score increasing from 69.5% to 77.7%. This shows that our method can effectively utilize unlabeled data for image caption tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generative adversarial network for semi-supervised image captioning

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Similar Papers

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu ... Zicheng Liu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Xiaowei Hu, et. al.Xiaowei Hu ... Zicheng Liu
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Learning with Unpaired Data
Jiebo Luo
-
Jiebo LuoJiebo Luo
01 Dec 2020
01 Dec 2020

Evaluation of Building Damage due to Natural Disaster using CNN and GAN
H Yamada
-
H YamadaH Yamada
30 Mar 2023
30 Mar 2023

Lumbar Vertebrae Synthetic Segmentation in Computed Tomography Images Using Hybrid Deep Generative Adversarial Networks.
Vania Malinda ... Deukhee Lee
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference | VOL. 2020
Vania Malinda, et. al.Vania Malinda ... Deukhee Lee
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generative adversarial network for semi-supervised image captioning

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding