Abstract

Image captioning is a multi-modal problem linking computer vision and natural language processing, which combines image analysis and text generation challenges. In the literature, most of the image captioning works have been accomplished in the English language only. This paper proposes a new approach for image captioning in the Hindi language using deep learning-based encoder-decoder architecture. Hindi, widely spoken in India and South Asia, is the fourth most spoken language globally; it is India’s official language. In recent years, significant advancement has been made in image captioning, utilizing encoder-decoder architectures based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Encoder CNN extracts features from input images, whereas decoder RNN performs language modeling. The proposed encoder-decoder architecture utilizes information multiplexing in the encoder CNN to achieve a performance gain in feature extraction. Extensive experimentation is carried out on the benchmark MSCOCO Hindi dataset, and significant improvements in BLEU score are reported compared to the baselines. Manual human evaluation in terms of adequacy and fluency of the generated captions further establishes the proposed method’s efficacy in generating good quality captions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call