Dense Model for Automatic Image Description Generation with Game Theoretic Optimization

Sreela S R,Sumam Mary Idicula

doi:10.3390/info10110354

Sreela S R, Sumam Mary Idicula

Open Access

PDF Available

https://doi.org/10.3390/info10110354

Copy DOI

Export

Save

Cite

Journal: Information	Publication Date: Nov 15, 2019
Citations: 6	License type: CC BY 4.0

Affiliation: Cochin University of Science and Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guidance for visually impaired people. Currently, deep neural networks play a vital role in computer vision and natural language processing tasks. The main objective of the work is to generate the grammatically correct description of the image using the semantics of the trained captions. An encoder-decoder framework using the deep neural system is used to implement an image description generation task. The encoder is an image parsing module, and the decoder is a surface realization module. The framework uses Densely connected convolutional neural networks (Densenet) for image encoding and Bidirectional Long Short Term Memory (BLSTM) for language modeling, and the outputs are given to bidirectional LSTM in the caption generator, which is trained to optimize the log-likelihood of the target description of the image. Most of the existing image captioning works use RNN and LSTM for language modeling. RNNs are computationally expensive with limited memory. LSTM checks the inputs in one direction. BLSTM is used in practice, which avoids the problem of RNN and LSTM. In this work, the selection of the best combination of words in caption generation is made using beam search and game theoretic search. The results show the game theoretic search outperforms beam search. The model was evaluated with the standard benchmark dataset Flickr8k. The Bilingual Evaluation Understudy (BLEU) score is taken as the evaluation measure of the system. A new evaluation measure called GCorrectwas used to check the grammatical correctness of the description. The performance of the proposed model achieves greater improvements over previous methods on the Flickr8k dataset. The proposed model produces grammatically correct sentences for images with a GCorrect of 0.040625 and a BLEU score of 69.96%

Highlights

The World Wide Web is a data store with a vast collection of images
The proposed image captioning system had state-of-the-art performance on the Flickr8k dataset by using the Bilingual Evaluation Understudy (BLEU) score evaluation measure
Various experiments were conducted with different deep Convolutional Neural Network (CNN) for encoding and different Recurrent Neural Networks (RNN) for decoding

Summary

Introduction

The World Wide Web is a data store with a vast collection of images. Image searching is a challenging task. The output of computer vision combined with language models is suitable for the image description generation process. Surface realization is the process of generating an image description. The proposed model differed from existing models in that it learned the semantics of the sentences using BLSTM’s bidirectional nature and mapped sentence features to complex image features. To achieve this objective, a framework for the automatic generation of image descriptions with two major components was proposed. A framework for the automatic generation of image descriptions with two major components was proposed BLSTM is implemented for language modeling and the other for caption generation.

Related Work

Image Captioning

Layout Based Approaches

Deep Neural Network Based Approaches

Deep Neural Network

Game Theory

Cooperative Game Theory

System Architecture

Image Model

Densenet

Dense Layer

Language Model

Embed Layer

Bidirectional LSTM

Time Distributed Dense Layer

Caption Model

Game Theoretic Algorithm for Caption Generation

Implementation

Training Details

Datasets

Preprocessing

Performance Evaluation of the Model

Grammatical Correctness of the Generated Description

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Dense Model for Automatic Image Description Generation with Game Theoretic Optimization

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Image Captioning using Hybrid of VGG16 and Bidirectional LSTM Model
Yufis Azhar ... M Randy Anugerah
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control | VOL. -
Yufis Azhar, et. al.Yufis Azhar ... M Randy Anugerah
10 Nov 2022
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control | VOL. -

Image Caption Generator Using DenseNet201 and ResNet50
Vidhi Khubchandani
International Journal of Future Computer and Communication | VOL. 13
Vidhi KhubchandaniVidhi Khubchandani
01 Jan 2024
Image Caption Generator Using DenseNet201 and ResNet50
Vidhi Khubchandani

Does BLEU Score Work for Code Migration?
Ngoc Tran ... Hoan Nguyen
-
Ngoc Tran, et. al.Ngoc Tran ... Hoan Nguyen
01 May 2019
01 May 2019

Empirical Study of Image Captioning Models Using Various Deep Learning Encoders
Gaurav ... Pratistha Mathur
-
Gaurav, et. al. Gaurav ... Pratistha Mathur
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Dense Model for Automatic Image Description Generation with Game Theoretic Optimization

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Information