Overview on Image Captioning Techniques

doi:10.30534/ijeter/2021/15982021

Abstract

Image captioning is a process to assign a meaningful title for a given image with the help of Natural Language Processing (NLP) and Computer Vision techniques. Captioning of an image first need to identify object, attribute and relationship among these in image and second is to generate relevant description for the given image. So it require both NLP and Computer vision techniques to perform image captioning task. Due to complexity of finding relationship between the attribute of the object and its feature makes it a challenging task. Also for machine it is difficult to emulate human brain however researches have shown a prominent achievement in this field and made it easy to solve such problems. The foremost aim of this survey paper is to describe several methods to achieve the same, the core involvement of this paper is to categorise different existing approaches for image captioning, further discussed their subcategories of this method and classify them, also discussed some of their strength and limitations. This survey paper gives theoretical analysis of image captioning methods and defines some earlier and newly approach for image captioning. This survey paper is basically a source of information for researchers in order to get idea of different approaches that were developed so far in the field of image captioning. Key words : Computer Vision, Deep Learning, Neural Network, NLP, Image Captioning, Multimodal Learning.

Full Text