Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in Hindi

Santosh Kumar Mishra,Sriparna Saha,Gaurav Rai,Pushpak Bhattacharyya

doi:10.1145/3483597

Abstract

Image captioning refers to the process of generating a textual description that describes objects and activities present in a given image. It connects two fields of artificial intelligence, computer vision, and natural language processing. Computer vision and natural language processing deal with image understanding and language modeling, respectively. In the existing literature, most of the works have been carried out for image captioning in the English language. This article presents a novel method for image captioning in the Hindi language using encoder–decoder based deep learning architecture with efficient channel attention. The key contribution of this work is the deployment of an efficient channel attention mechanism with bahdanau attention and a gated recurrent unit for developing an image captioning model in the Hindi language. Color images usually consist of three channels, namely red, green, and blue. The channel attention mechanism focuses on an image’s important channel while performing the convolution, which is basically to assign higher importance to specific channels over others. The channel attention mechanism has been shown to have great potential for improving the efficiency of deep convolution neural networks (CNNs). The proposed encoder–decoder architecture utilizes the recently introduced ECA-NET CNN to integrate the channel attention mechanism. Hindi is the fourth most spoken language globally, widely spoken in India and South Asia; it is India’s official language. By translating the well-known MSCOCO dataset from English to Hindi, a dataset for image captioning in Hindi is manually created. The efficiency of the proposed method is compared with other baselines in terms of Bilingual Evaluation Understudy (BLEU) scores, and the results obtained illustrate that the method proposed outperforms other baselines. The proposed method has attained improvements of 0.59%, 2.51%, 4.38%, and 3.30% in terms of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores, respectively, with respect to the state-of-the-art. Qualities of the generated captions are further assessed manually in terms of adequacy and fluency to illustrate the proposed method’s efficacy.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in Hindi

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Dec 13, 2021
Citations: 7

Similar Papers

An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi
Santosh Kumar Mishra ... Sriparna Saha
-
Santosh Kumar Mishra, et. al.Santosh Kumar Mishra ... Sriparna Saha
17 Oct 2021
17 Oct 2021

Captioning Remote Sensing Images Using Transformer Architecture
Wrucha Nanal ... Mohammadreza Hajiarbabi
-
Wrucha Nanal, et. al.Wrucha Nanal ... Mohammadreza Hajiarbabi
20 Feb 2023
20 Feb 2023

Image Captioning using Hybrid of VGG16 and Bidirectional LSTM Model
Yufis Azhar ... M Randy Anugerah
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control | VOL. -
Yufis Azhar, et. al.Yufis Azhar ... M Randy Anugerah
10 Nov 2022
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control | VOL. -

A Deep Attention based Framework for Image Caption Generation in Hindi Language
Rijul Dhir ... Santosh Kumar Mishra
Computación y Sistemas | VOL. 23
Rijul Dhir, et. al.Rijul Dhir ... Santosh Kumar Mishra
07 Oct 2019
Computación y Sistemas | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in Hindi

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing