Image Caption Generation Research Articles

Automatic caption generation from images has emerged as a fundamental and challenging problem at the intersection of computer vision and natural language processing. This paper presents a comprehensive survey of the techniques, methodologies, and advancements in the field of automatic caption generation from images. The primary objective is to provide an extensive review of the state-of-the-art models, evaluation metrics, datasets, and applications associated with this domain. The survey begins by elucidating the underlying principles of image feature extraction and caption generation. Various neural network architectures, including Convolutional Neural Networks (CNNs) and recurrent models such as Long Short-Term Memory (LSTM) networks, are discussed in detail. Additionally, the paper explores the integration of attention mechanisms and reinforcement learning strategies to enhance the quality and relevance of generated captions. A thorough examination of evaluation metrics, encompassing both automated and human-centric approaches, is presented to evaluate the generated captions quantitatively and qualitatively. The survey also highlights prominent datasets that have significantly contributed to the advancement of research in this field, facilitating a deeper understanding of challenges and trends. Furthermore, the paper discusses practical applications and real-world use cases where automatic caption generation plays a pivotal role, including accessibility, multimedia indexing, and assistive technologies. The discussion concludes by outlining open challenges and future directions, aiming to inspire further research and innovation in automatic caption generation from images. The aim of this paper is to examine and contrast diverse end-to-end learning frameworks for image captioning, employing established evaluation metrics to comprehend their applicability across different research domains. In addition to the comparative analysis, the paper addresses future challenges in this domain.

Read full abstract

Disease image classification systems play a crucial role in identifying disease categories in the field of agricultural diseases. However, current plant disease image classification methods can only predict the disease category and do not offer explanations for the characteristics of the predicted disease images. Due to the current situation, this paper employed image description generation technology to produce distinct descriptions for different plant disease categories. A two-stage model called DIC-Transformer, which encompasses three tasks (detection, interpretation, and classification), was proposed. In the first stage, Faster R-CNN was utilized to detect the diseased area and generate the feature vector of the diseased image, with the Swin Transformer as the backbone. In the second stage, the model utilized the Transformer to generate image captions. It then generated the image feature vector, which is weighted by text features, to improve the performance of image classification in the subsequent classification decoder. Additionally, a dataset containing text and visualizations for agricultural diseases (ADCG-18) was compiled. The dataset contains images of 18 diseases and descriptive information about their characteristics. Then, using the ADCG-18, the DIC-Transformer was compared to 11 existing classical caption generation methods and 10 image classification models. The evaluation indicators for captions include Bleu1–4, CiderD, and Rouge. The values of BLEU-1, CIDEr-D, and ROUGE were 0.756, 450.51, and 0.721. The results of DIC-Transformer were 0.01, 29.55, and 0.014 higher than those of the highest-performing comparison model, Fc. The classification evaluation metrics include accuracy, recall, and F1 score, with accuracy at 0.854, recall at 0.854, and F1 score at 0.853. The results of DIC-Transformer were 0.024, 0.078, and 0.075 higher than those of the highest-performing comparison model, MobileNetV2. The results indicate that the DIC-Transformer outperforms other comparison models in classification and caption generation.

Read full abstract

Image Caption Generation Research Articles

Related Topics

Articles published on Image Caption Generation

A comprehensive review of image caption generation

Diffusion-Cap: A diffusion model for image captioning

Hybrid explainable image caption generation using image processing and natural language processing

IMAGE CAPTION GENERATOR USING CNN AND LSTM

Image Caption Generator Using CNN and LSTM

Enhancing image caption generation through context-aware attention mechanism

Automatic Audio and Image Caption Generation with Deep Learning

A transformer-based Urdu image caption generation

Enhancing Image Captioning Using Deep Convolutional Generative Adversarial Networks

Chinese image captioning with fusion encoder and visual keyword search

Image Caption Prediction Using Deep Learning

Image Caption Generator

Research on image caption generation method based on multi-modal pre-training model and text mixup optimization

A Comparative Study of Feature Extraction Models for Image Caption Generation

Image Caption Generator using Deep Learning

Detection and Caption Generation of Image Using Deep Learning

DeepLens: Integrating Deep Learning for Image Captioning and Hashtag Generation

Exploring a Spectrum of Deep Learning Models for Automated Image Captioning: A Comprehensive Survey

Image Captioning: A Comprehensive Review

DIC-Transformer: interpretation of plant disease classification results using image caption generation technology

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Image Caption Generation Research Articles

Related Topics

Articles published on Image Caption Generation

A comprehensive review of image caption generation

Diffusion-Cap: A diffusion model for image captioning

Hybrid explainable image caption generation using image processing and natural language processing

IMAGE CAPTION GENERATOR USING CNN AND LSTM

Image Caption Generator Using CNN and LSTM

Enhancing image caption generation through context-aware attention mechanism

Automatic Audio and Image Caption Generation with Deep Learning

A transformer-based Urdu image caption generation

Enhancing Image Captioning Using Deep Convolutional Generative Adversarial Networks

Chinese image captioning with fusion encoder and visual keyword search

Image Caption Prediction Using Deep Learning

Image Caption Generator

Research on image caption generation method based on multi-modal pre-training model and text mixup optimization

A Comparative Study of Feature Extraction Models for Image Caption Generation

Image Caption Generator using Deep Learning

Detection and Caption Generation of Image Using Deep Learning

DeepLens: Integrating Deep Learning for Image Captioning and Hashtag Generation

Exploring a Spectrum of Deep Learning Models for Automated Image Captioning: A Comprehensive Survey

Image Captioning: A Comprehensive Review

DIC-Transformer: interpretation of plant disease classification results using image caption generation technology