With the rapid development of Deep learning, AI along with Computer Vision and Natural Language processing Image caption has become an interesting and complex task. Image caption generation is the process of generating textual description of the given image and it is a challenging task because it consists of apprehension of objects. If the machine will be programmed to accurately describe an image or environment like human vision, it will be highly beneficial for robotic vision, business and many more. In order to generate an effective description of the image, the machine needs to detect, recognize objects as well as understand the scene type or location, object properties, their relationships and their interactions with each other. In this paper, we focus on advanced image captioning techniques such as CNN (Convolutional Neural Network)-LSTM(Long Short Term Memory) to generate meaningful captions. and the advantages and limitations of each method are discussed.
Read full abstract