Abstract

Captioning images is the process of generating the image description in a textual format that has wide applications in various domains. The research in this area is ongoing and seeking increasing attention as it merges the fields of computer vision and natural language processing. In the presented work, an image caption generation model consisting of a probabilistic and neural framework is built following an encoder-decoder scheme. The image is inputted to the deep learning classifier called Convolutional Neural Network (CNN) and to generate a set of sentences, a Long-Short Term Memory (LSTM) model is used. The captions generated by the proposed captioning model are compared over the other conventional models with respect to statistical analysis using standard and popular similarity metrics called Cosine Similarity, Precision and Recall. The results of evaluation of the predicted image captions compared to the actual captions are presented and the applicability of evaluation methods in context of image captioning is discussed and justified.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.