Development of Automated Image Caption Generator in Real-Time Application Using Pre-trained CNN Models

Alla Naga Venkata Nancharaiah,Gunturu Kalpana,Shaik Fayaz Ahamed

doi:10.1007/978-981-19-3311-0_40

Abstract

AbstractGenerating captions automatically to an image, which is commonly termed Automated Image Captioning (AIC), is a demanding task. The essence of both image processing and natural language processing are involved in image captioning. This paper deals with the generation of descriptive sentences to a given image. Encoder–Decoder architecture is implemented here to develop the image captioning project. Three pre-trained convolution neural network (CNN) models—EfficientNet, ResNet, and DenseNet have been compared. Long Short-Term Memory (LSTM) is used as a recurrent neural network (RNN) model, which acts as a decoder. The existing mechanisms of image captioning have been discussed by doing a literature survey. The performance of the models has also been evaluated using standard evaluation metrics. In the end, by considering one of the applications of image captioning, i.e., for visually impaired persons, the generated model is developed in the real-time device to make a prototype by interfacing camera.KeywordsImage captioningImage processingNatural language processingLong short-term memoryConvolution neural networkRecurrent neural network

Full Text