Image Caption Generation Research Articles

The analysis of large-scale multimodal data has become very popular recently. Image captioning, whose goal is to describe the content of image with natural language automatically, is an essential and challenging task in artificial intelligence. Commonly, most existing image caption methods utilize the mixture of Convolutional Neural Network and Recurrent Neural Network framework. These methods either pay attention to global representation at the image level or only focus on the specific concepts, such as regions and objects. To make the most of characteristics about a given image, in this study, we present a novel model named Multilevel Attention Networks and Policy Reinforcement Learning for image caption generation. Specifically, our model is composed of a multilevel attention network module and a policy reinforcement learning module. In the multilevel attention network, the object-attention network aims to capture global and local details about objects, whereas the region-attention network obtains global and local features about regions. After that, a policy reinforcement learning algorithm is adopted to overcome the exposure bias problem in the training phase and solve the loss-evaluation mismatching problem at the caption generation stage. With the attention network and policy algorithm, our model can automatically generate accurate and natural sentences for any particular image. We carry out extensive experiments on the MSCOCO and Flickr30k data sets, demonstrating that our model is superior to other competitive methods.

Read full abstract

Abstract: When humans see an image, their brain can easily tell what the image is about, but a computer cannot do it easily. Computer vision researchers worked on this a lot and they considered it impossible until now! With the advancement in Deep learning techniques, availability of huge datasets and computer power, we can build models that can generate captions for an image. Image Caption Generator is a popular research area of Deep Learning that deals with image understanding and a language description for that image. Generating well-formed sentences requires both syntactic and semantic understanding of the language. Being able to describe the content of an image using accurately formed sentences is a very challenging task, but it could also have a great impact, by helping visually impaired people better understand the content of images. The biggest challenge is most definitely being able to create a description that must capture not only the objects contained in an image, but also express how these objects relate to each other. This paper uses Flickr_8K dataset and Flickr8k_text folder that contains Flickr8k.token which is the main file of our dataset that contains image name and their respective caption separated by newline(“\n”). CNN is used for extracting features from the image. We will use the pre-trained model Xception. LSTM will use the information from CNN to help generate a description of the image. In our Flickr8k_text folder, we have Flickr_8k.trainImages.txt file that contains a list of 6000 images names that we will use for training. After CNN-LSTM model is defined we give an image file as parameter through command prompt for testing image caption generator and it generates the caption of an image and its accuracy is observed by calculating bleu score for generated and reference captions. Keywords: Image Caption Generator, Convolutional Neural Network, Long Short-Term Memory, Bleu score, Flickr_8K

Read full abstract

Image Caption Generation Research Articles

Related Topics

Articles published on Image Caption Generation

Review on Image Caption Generation

Automatic Image Caption Generation Based on Some Machine Learning Algorithms

Learning cross-modality features for image caption generation

Variational joint self‐attention for image captioning

Diverse and styled image captioning using singular value decomposition‐based mixture of recurrent experts

A novel framework for automatic caption and audio generation

CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation

Image Caption Generation Method Based on Knowledge Graph Guidance and Self-Attention Mechanism

ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING

Multilevel Attention Networks and Policy Reinforcement Learning for Image Caption Generation.

Image Caption Generator Using Deep Learning

Analysis of the Fuzziness of Image Caption Generation Models due to Data Augmentation Techniques

A Rapid Review of Image Captioning

Automatic Image Caption Generation Using CNN, RNN and LSTM

Two-Tier LSTM Model for Image Caption Generation

Image Captioning using CNN and LSTM

Detection and Recognition of Object for Image Captioning

Image Caption Generation and Comprehensive Comparison of Image Encoders

Natural language guided object retrieval in images

An encoder-decoder based framework for hindi image caption generation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Image Caption Generation Research Articles

Related Topics

Articles published on Image Caption Generation

Review on Image Caption Generation

Automatic Image Caption Generation Based on Some Machine Learning Algorithms

Learning cross-modality features for image caption generation

Variational joint self‐attention for image captioning

Diverse and styled image captioning using singular value decomposition‐based mixture of recurrent experts

A novel framework for automatic caption and audio generation

CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation

Image Caption Generation Method Based on Knowledge Graph Guidance and Self-Attention Mechanism

ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING

Multilevel Attention Networks and Policy Reinforcement Learning for Image Caption Generation.

Image Caption Generator Using Deep Learning

Analysis of the Fuzziness of Image Caption Generation Models due to Data Augmentation Techniques

A Rapid Review of Image Captioning

Automatic Image Caption Generation Using CNN, RNN and LSTM

Two-Tier LSTM Model for Image Caption Generation

Image Captioning using CNN and LSTM

Detection and Recognition of Object for Image Captioning

Image Caption Generation and Comprehensive Comparison of Image Encoders

Natural language guided object retrieval in images

An encoder-decoder based framework for hindi image caption generation