Exploring Bengali Image Descriptions through the combination of diverse CNN Architectures and Transformer Decoders

Biswajit Patra,Dakshina Ranjan Kisku

doi:10.31127/tuje.1507442

Biswajit Patra, Dakshina Ranjan Kisku

Open Access

https://doi.org/10.31127/tuje.1507442

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In recent years, there has been growing interest among researchers in the field of image captioning, which involves generating one or more descriptions for an image that closely resembles a human-generated description. Most of the existing studies in this area focus on the English language, utilizing CNN and RNN variants as encoder and decoder models, often enhanced by attention mechanisms. Despite Bengali being the fifth most-spoken native language and the seventh most widely spoken language, it has received far less attention in comparison to resource-rich languages like English. This study aims to bridge that gap by introducing a novel approach to image captioning in Bengali. By leveraging state-of-the-art Convolutional Neural Networks such as EfficientNetV2S, ConvNeXtSmall, and InceptionResNetV2 along with an improvised Transformer, the proposed system achieves both computational efficiency and the generation of accurate, contextually relevant captions. Additionally, Bengali text-to-speech synthesis is incorporated into the framework to assist visually impaired Bengali speakers in understanding their environment and visual content more effectively. The model has been evaluated using a chimeric dataset, combining Bengali descriptions from the Ban-Cap dataset with corresponding images from the Flickr 8k dataset. Utilizing EfficientNet, the proposed model attains METEOR, CIDEr, and ROUGE scores of 0.34, 0.30, and 0.40, while BLEU scores for unigram, bigram, trigram, and four-gram matching are 0.66, 0.59, 0.44 and 0.26 respectively. The study demonstrates that the proposed approach produces precise image descriptions, outperforming other state-of-the-art models in generating Bengali descriptions.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Exploring Bengali Image Descriptions through the combination of diverse CNN Architectures and Transformer Decoders

Abstract

Published Version

Talk to us

Similar Papers

More From: Turkish Journal of Engineering

Lead the way for us

Similar Papers

A Comparative Study of Machine Learning Based Image Captioning Models
Priya Singh ... Hardik Jain
-
Priya Singh, et. al.Priya Singh ... Hardik Jain
28 Apr 2022
28 Apr 2022

Synthesis of Vision and Language: Multifaceted Image Captioning Application
Arpit Gupta ... Himanshu Goyal
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Arpit Gupta, et. al.Arpit Gupta ... Himanshu Goyal
23 Dec 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

A Neural Network Framework to Generate Caption from Images
Ayan Ghosh ... Debarati Dutta
-
Ayan Ghosh, et. al.Ayan Ghosh ... Debarati Dutta
17 Jul 2019
17 Jul 2019

Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning
Tian Xie ... Jiehua Wang
Applied Sciences | VOL. 13
Tian Xie, et. al.Tian Xie ... Jiehua Wang
06 Jul 2023
Applied Sciences | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Exploring Bengali Image Descriptions through the combination of diverse CNN Architectures and Transformer Decoders

Abstract

Published Version

Talk to us

Similar Papers

More From: Turkish Journal of Engineering