Abstract

Generating image description automatically from the content of an image is one of the fundamental problems in artificial intelligence. This task involves the knowledge of both computer vision and natural language processing, called “Image caption generation” Many researches have been carried out in this field, but it was mainly focused on generating image descriptions in English, as existing image caption datasets are mostly in English. However, the image captioning should not be restricted by language. The lack of image captioning dataset other than English is a problem, especially for a morphologically rich language such as Hindi. Thus, to tackle this problem, this research constructed Hindi image caption dataset based on images from Flickr8k dataset using Google cloud translator, which is called Flickr8k-Hindi Datasets The Flickr8k-Hindi Datasets consist of four datasets based on a number of descriptions per image and clean or unclean descriptions. This study also finds the best effective method to generate image description using the encoder-decoder neural network model. The experiments showed that training the model with a single clean description per image generates higher-quality caption than a model trained with five uncleaned descriptions per image. Although model trained with five uncleaned descriptions per image achieved BLEU-I score of 0.585, which is the current state of the art.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.