Abstract

Abstract: Image captioning is a brand-new study area in the science of computer vision. The primary goal of picture captioning is to create a natural language description for the input image. In recent years, research on natural language processing and computer vision has become increasingly interested in the problem of automatically synthesising descriptive phrases for photos. Image captioning is a crucial task that demands both the ability to create precise and accurate description phrases as well as a semantic understanding of the images. Long Short Term Memory (LSTM) is used to precisely organise data using the available keywords to form meaningful sentences. The authors of this research propose a hybrid system based on multilayer Convolutional Neural Networks to create a lexicon for characterising the visuals. The convolutional neural network employs trained captions to deliver an accurate description after comparing the target image to a sizable dataset of training images. We demonstrate the effectiveness of our suggested methodology using the Flickr 8K datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call