Abstract
Image caption is an important application field of artificial intelligence technique. When a machine can describe a picture reasonably like a human, it represents that the machine has higher intelligence to understand the picture. However, for complex machine learning tasks such as image caption, data annotation is time-consuming and laborious. Usually in a new application scenario, data annotation rarely results in poor model performance. A large number of easily available unlabeled image data make it possible to semi-supervised learning of image caption methods. Based on the existing end-to-end deep learning paradigm, a semi-supervised deep learning method is proposed in this paper, called N-gram + Pseudo Label NIC method. The method combines the current mainstream deep neural network method, e.g. the NIC (Neural Image Caption) model, and the semi-supervised deep learning idea of pseudo labels, and N-gram. This method generates pseudo labels by N-gram Search algorithm, and improves the effect of the model by utilizing the prior knowledge of the N-gram table and people's descriptive habits. This method has achieved better results than the original NIC model on different sub-data sets of Flickr 8K data set and MSCOCO data set of 0.5k, 1k, 2k and 3k under BLEU-1 evaluation criteria.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.