An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi

Santosh Kumar Mishra,Mahesh Babu Peethala,Pushpak Bhattacharyya,Sriparna Saha

doi:10.1109/smc52423.2021.9658859

Santosh Kumar Mishra, Mahesh Babu Peethala + Show 2 more

https://doi.org/10.1109/smc52423.2021.9658859

Copy DOI

Abstract

Image captioning is a multi-modal problem linking computer vision and natural language processing, which combines image analysis and text generation challenges. In the literature, most of the image captioning works have been accomplished in the English language only. This paper proposes a new approach for image captioning in the Hindi language using deep learning-based encoder-decoder architecture. Hindi, widely spoken in India and South Asia, is the fourth most spoken language globally; it is India’s official language. In recent years, significant advancement has been made in image captioning, utilizing encoder-decoder architectures based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Encoder CNN extracts features from input images, whereas decoder RNN performs language modeling. The proposed encoder-decoder architecture utilizes information multiplexing in the encoder CNN to achieve a performance gain in feature extraction. Extensive experimentation is carried out on the benchmark MSCOCO Hindi dataset, and significant improvements in BLEU score are reported compared to the baselines. Manual human evaluation in terms of adequacy and fluency of the generated captions further establishes the proposed method’s efficacy in generating good quality captions.

Full Text