Abstract

This paper discusses an efficient approach to captioning a given image using a combination of Convolutional Neural Network (CNN) and Recurrent Neural Networks (RNN) with Long Short Term Memory Cells (LSTM). Image captioning is a realm of deep learning and computer vision which deals with generating relevant captions for a given input image. The research in this area includes the hyperparameter tuning of Convolutional Neural Networks and Recurrent Neural Networks to generate captions which are as accurate as possible. The basic outline of the process includes giving an image as input to the CNN which outputs a feature map. This feature map is passed as input to the RNN which outputs a sentence describing the image. The research in image captioning is relevant because this method demonstrates the true power of the encoder-decoder network made up of Convolutional Neural Network and Recurrent Neural Network and potentially will open many pathways for further interesting research on different types of neural networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call