Synthesis of Vision and Language: Multifaceted Image Captioning Application

Arpit Gupta,Ishita Kohli,Himanshu Goyal

doi:10.55041/ijsrem27770

Abstract

The rapid advancement in image captioning has been a pivotal area of research, aiming to mimic human-like understanding of visual content. This paper presents an innovative approach that integrates attention mechanisms and object features into an image captioning model. Leveraging the Flickr8k dataset, this research explores the fusion of these components to enhance image comprehension and caption generation. Furthermore, the study showcases the implementation of this model in a user-friendly application using FASTAPI and ReactJS, offering text-to-speech translation in multiple languages. The findings underscore the efficacy of this approach in advancing image captioning technology. This tutorial outlines the construction of an image caption generator, employing Convolutional Neural Network (CNN) for image feature extraction and Long Short-Term Memory Network (LSTM) for Natural Language Processing (NLP). Keywords—Convolutional Neural Networks, Long Short Term Memory, Attention Mechanism, Transformer Architecture, Vision Transformers, Transfer Learning, Multimodal fusion, Deep Learning Models, Pre-Trained Models, Image Processing Techniques

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Synthesis of Vision and Language: Multifaceted Image Captioning Application

Abstract

Talk to us

Similar Papers

More From: INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT

Lead the way for us

Similar Papers

Comparative Evaluation of CNN Architectures for Image Caption Generation
Sulabh Katiyar ... Samir Kumar
International Journal of Advanced Computer Science and Applications | VOL. 11
Sulabh Katiyar, et. al.Sulabh Katiyar ... Samir Kumar
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

Image caption generation with dual attention mechanism
Maofu Liu ... Jing Tian
Information Processing & Management | VOL. 57
Maofu Liu, et. al.Maofu Liu ... Jing Tian
12 Dec 2019
Information Processing & Management | VOL. 57

Optimized Image Captioning: Hybrid Transformers Vision Transformers and Convolutional Neural Networks: Enhanced with Beam Search
Sushma Jaiswal ... Rajesh P Chinchewadi
International Journal of Intelligent Systems and Applications | VOL. 16
Sushma Jaiswal, et. al.Sushma Jaiswal ... Rajesh P Chinchewadi
08 Apr 2024
International Journal of Intelligent Systems and Applications | VOL. 16

An accurate generation of image captions for blind people using extended convolutional atom neural network.
Tejal Tiwary ... Rajendra Prasad Mahapatra
Multimedia Tools and Applications | VOL. 82
Tejal Tiwary, et. al.Tejal Tiwary ... Rajendra Prasad Mahapatra
15 Jul 2022
Multimedia Tools and Applications | VOL. 82

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Synthesis of Vision and Language: Multifaceted Image Captioning Application

Abstract

Talk to us

Similar Papers

More From: INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT