Abstract

Image Captioning is the process of generating textual descriptions of an image, which need to be syntactically and semantically correct. This paper extensively surveys very early literature that includes the advent of Artificial Intelligence, the Machine Learning pathway, the early Deep Learning and the current Deep Learning methodology for Image Captioning. This survey paper aims to develop a system to predict captions for the given images with a higher accuracy by combining the results of different Deep Learning Techniques. This model based on a neural network consists of a vision CNN followed by the language generator RNN. It generates complete sentence in natural language from an input image. The state of art is achieved by comparing three different encoder –decoder models. By comparing three models, the blue score of CNN-LSTM Model with Flikr 8k dataset is 0.44, CNN-LSTM Word Embedding with Flikr 8k dataset is 0.68 and CNN –GRU model Visual Attention with MSCOCO Dataset is 0.86.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call