GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi

Santosh Kumar Mishra,Pushpak Bhattacharyya,Soham Chakraborty,Sriparna Saha

doi:10.1145/3622936

Abstract

Image captioning frameworks usually employ an encoder-decoder paradigm, with the encoder receiving abstract image feature vectors as input and the decoder for language modeling. Nowadays, most prominent architectures employ features from region proposals derived from object detection modules. In this work, we propose a novel architecture for image captioning. We employ the object detection module integrated with transformer architecture as an encoder and GPT-2 (Generative Pre-trained Transformer) as a decoder. The encoder utilizes the information of the spatial relationships among detected objects. We introduce a unique methodology for image caption generation in Hindi, which is widely spoken in South Asia and India and is the world’s third most spoken language as well as India’s official language. In terms of BLEU scores, the proposed approach’s performance is comparable to those of other baselines, and the results illustrate that the proposed approach outperforms the other baselines. The efficacy of the proposed approach in generating correct captions is further determined by human assessment in terms of adequacy and fluency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Similar Papers

Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
Santosh Kumar Mishra ... Sriparna Saha
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Santosh Kumar Mishra, et. al.Santosh Kumar Mishra ... Sriparna Saha
24 Mar 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

A Hindi Image Caption Generation Framework Using Deep Learning
Santosh Kumar Mishra ... Pushpak Bhattacharyya
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20
Santosh Kumar Mishra, et. al.Santosh Kumar Mishra ... Pushpak Bhattacharyya
15 Mar 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20

Synthesis of Vision and Language: Multifaceted Image Captioning Application
Arpit Gupta ... Himanshu Goyal
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Arpit Gupta, et. al.Arpit Gupta ... Himanshu Goyal
23 Dec 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

The general intelligence of GPT–4, its knowledge diffusive and societal influences, and its governance
Mohammad Mahdi Jahani Yekta
Meta-Radiology | VOL. 2
Mohammad Mahdi Jahani YektaMohammad Mahdi Jahani Yekta
28 Mar 2024
Meta-Radiology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GAGPT-2: A Geometric Attention-based GPT-2 Framework for Image Captioning in Hindi

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing