Dual-Modal Transformer with Enhanced Inter- and Intra-Modality Interactions for Image Captioning

Deepika Kumar,Daniela Elena Popescu,Varun Srivastava,Jude D Hemanth

doi:10.3390/app12136733

Abstract

Image captioning is oriented towards describing an image with the best possible use of words that can provide a semantic, relatable meaning of the scenario inscribed. Different models can be used to accomplish this arduous task depending on the context and requirement of what needs to be achieved. An encoder–decoder model which uses the image feature vectors as an input to the encoder is often marked as one of the appropriate models to accomplish the captioning process. In the proposed work, a dual-modal transformer has been used which captures the intra- and inter-model interactions in a simultaneous manner within an attention block. The transformer architecture is quantitatively evaluated on a publicly available Microsoft Common Objects in Context (MS COCO) dataset yielding a Bilingual Evaluation Understudy (BLEU)-4 Score of 85.01. The efficacy of the model is evaluated on Flickr 8k, Flickr 30k datasets and MS COCO datasets and results for the same is compared and analysed with the state-of-the-art methods. The results shows that the proposed model outperformed when compared with conventional models, such as the encoder–decoder model and attention model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jul 2, 2022
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dual-Modal Transformer with Enhanced Inter- and Intra-Modality Interactions for Image Captioning

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

I2T2I: Learning text to image synthesis with textual data augmentation
Hao Dong ... Yike Guo
-
Hao Dong, et. al.Hao Dong ... Yike Guo
01 Sep 2017
01 Sep 2017

TOD-Net: An end-to-end transformer-based object detection network
Museboyina Sirisha ... S.V Sudha
Computers and Electrical Engineering | VOL. 108
Museboyina Sirisha, et. al.Museboyina Sirisha ... S.V Sudha
08 Apr 2023
Computers and Electrical Engineering | VOL. 108

Attention Mechanism Model Combined with Adversarial Learning for E-commerce User Behavior Classification and Personality Recommendation
Dr.Sharif Uddin Ahmed Rana
Qeios | VOL. -
Dr.Sharif Uddin Ahmed RanaDr.Sharif Uddin Ahmed Rana
12 Sep 2023
Qeios | VOL. -

Adaptive Semantic-Enhanced Transformer for Image Captioning.
Jing Zhang ... Zhe Wang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
Jing Zhang, et. al.Jing Zhang ... Zhe Wang
01 Feb 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dual-Modal Transformer with Enhanced Inter- and Intra-Modality Interactions for Image Captioning

Abstract

Talk to us

Similar Papers

More From: Applied Sciences