Abstract

In today's world of social media, almost everyone is a part of social platform and actively interacting with each other through internet. People on social media upload many pictures on their social media accounts with different captions. Thinking about the appropriate caption is a tedious process. Caption is important to effectively describe the content and meaning of a picture. Caption describes the image in meaningful sentences. A model for image caption generator can be built which is used to generate caption for images of different types and resolutions. Image captioning model which is used to generate caption in language that is understandable by a human being for the input images. CNN(convolution neural network) and RNN(recurrent neural network) is used using the concept of encoder-decoder to build this model. As CNN is used for image feature extraction purpose where only the important features or the important pixels, if the image is considered in the form of matrix of pixels, which are extracted from the resultant image, instead of CNN model, other pre-trained imagenet models which have higher accuracy will be used and their results are then compared by using BLEU score metric for comparison. For the prediction of captions, beam search method and argmax method is used and compared. The above discussed supervised image caption model is also comparedwith the built unsupervised image captioning model. The flickr8k dataset and then MSCOCO dataset are used to train and test the model. This model if implemented with the mobile application, which can be very useful for differently abled people, who completely rely on the assistance of text-to-speech feature.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.