Deep learning apporach for image captioning in Hindi language

Ankit Rathi

doi:10.1109/iccece48148.2020.9223087

Abstract

Generating image description automatically from the content of an image is one of the fundamental problems in artificial intelligence. This task involves the knowledge of both computer vision and natural language processing, called “Image caption generation” Many researches have been carried out in this field, but it was mainly focused on generating image descriptions in English, as existing image caption datasets are mostly in English. However, the image captioning should not be restricted by language. The lack of image captioning dataset other than English is a problem, especially for a morphologically rich language such as Hindi. Thus, to tackle this problem, this research constructed Hindi image caption dataset based on images from Flickr8k dataset using Google cloud translator, which is called Flickr8k-Hindi Datasets The Flickr8k-Hindi Datasets consist of four datasets based on a number of descriptions per image and clean or unclean descriptions. This study also finds the best effective method to generate image description using the encoder-decoder neural network model. The experiments showed that training the model with a single clean description per image generates higher-quality caption than a model trained with five uncleaned descriptions per image. Although model trained with five uncleaned descriptions per image achieved BLEU-I score of 0.585, which is the current state of the art.

Full Text