A semi-supervised deep learning image caption model based on Pseudo Label and N-gram

Cheng Cheng,Chunping Li,Youfang Han,Yan Zhu

doi:10.1016/j.ijar.2020.12.016

Abstract

Image caption is an important application field of artificial intelligence technique. When a machine can describe a picture reasonably like a human, it represents that the machine has higher intelligence to understand the picture. However, for complex machine learning tasks such as image caption, data annotation is time-consuming and laborious. Usually in a new application scenario, data annotation rarely results in poor model performance. A large number of easily available unlabeled image data make it possible to semi-supervised learning of image caption methods. Based on the existing end-to-end deep learning paradigm, a semi-supervised deep learning method is proposed in this paper, called N-gram + Pseudo Label NIC method. The method combines the current mainstream deep neural network method, e.g. the NIC (Neural Image Caption) model, and the semi-supervised deep learning idea of pseudo labels, and N-gram. This method generates pseudo labels by N-gram Search algorithm, and improves the effect of the model by utilizing the prior knowledge of the N-gram table and people's descriptive habits. This method has achieved better results than the original NIC model on different sub-data sets of Flickr 8K data set and MSCOCO data set of 0.5k, 1k, 2k and 3k under BLEU-1 evaluation criteria.

Full Text