Abstract

Automatic image caption generation is a challenging AI problem since it requires utilization of several techniques from different computer science domains such as computer vision and natural language processing. Deep learning techniques have demonstrated outstanding results in many different applications. However, data augmentation in deep learning, which replicates the amount and the variety of training data available for learning models without the burden of collecting new data, is a promising field in machine learning. Generating textual description for a given image is a challenging task for computers. Nowadays, deep learning performs a significant role in the manipulation of visual data with the help of Convolutional Neural Networks (CNN). In this study, CNNs are employed to train prediction models which will help in automatic image caption generation. The proposed method utilizes the concept of data augmentation to overcome the fuzziness of well-known image caption generation models. Flickr8k dataset is used in the experimental work of this study and the BLEU score is applied to evaluate the reliability of the proposed method. The results clearly show the stability of the outcomes generated through the proposed method when compared to others.

Highlights

  • Auto generation of captions for images is quite a complex task for computers

  • Understanding the objects in images is not the only task for computers and understanding the relation of these objects in order to to translate this relation in natural language to mimic like a human

  • By using (1), we develop a formula represented in (2) where n is the number of epochs, k is the ID of the image, j is to identify the BLEU score grams, and j follows the following inequality 1 ≤ j ≤ 4

Read more

Summary

Introduction

Auto generation of captions for images is quite a complex task for computers. Many big names like Google, Microsoft, Apple, etc. Are working on the improvement of image analysis. Understanding the objects in images is not the only task for computers and understanding the relation of these objects in order to to translate this relation in natural language to mimic like a human. This task is quite expensive in terms of computational cost. The story began in 2010 when [1] proposed a method to describe an image into a sentence.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call