Abstract

AbstractGenerating captions automatically to an image, which is commonly termed Automated Image Captioning (AIC), is a demanding task. The essence of both image processing and natural language processing are involved in image captioning. This paper deals with the generation of descriptive sentences to a given image. Encoder–Decoder architecture is implemented here to develop the image captioning project. Three pre-trained convolution neural network (CNN) models—EfficientNet, ResNet, and DenseNet have been compared. Long Short-Term Memory (LSTM) is used as a recurrent neural network (RNN) model, which acts as a decoder. The existing mechanisms of image captioning have been discussed by doing a literature survey. The performance of the models has also been evaluated using standard evaluation metrics. In the end, by considering one of the applications of image captioning, i.e., for visually impaired persons, the generated model is developed in the real-time device to make a prototype by interfacing camera.KeywordsImage captioningImage processingNatural language processingLong short-term memoryConvolution neural networkRecurrent neural network

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.