Image captioning using deep learning and python

Nishad Manish,Sharan Tripathi Shankar,Choubey Siddhartha,Kumar Gunjan,Sahu Lokeshwari,Yadav Madhu,Verma Rahul

doi:10.26634/jse.18.3.20582

Nishad Manish, Sharan Tripathi Shankar + Show 5 more

https://doi.org/10.26634/jse.18.3.20582

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In recent years, the confluence of computer vision and natural language processing, propelled by advancements in deep learning, has garnered significant interest. Among its notable applications, image captioning stands out, enabling computers to comprehend visual content through one or more sentences. This process entails not only identifying objects and scenes but also analyzing their attributes, states, and interrelations, culminating in the generation of meaningful descriptions encapsulating high-level image semantics. While inherently complex, image captioning has seen remarkable progress thanks to the efforts of numerous researchers. This paper offers a comprehensive review of three prominent image captioning methodologies leveraging deep neural networks: CNN-RNN, CNN-CNN, and Reinforcement-based frameworks. Each approach is accompanied by a detailed analysis of representative works, elucidating their respective contributions. Furthermore, evaluation metrics pertinent to these methods are discussed, followed by a synthesis of their advantages and primary challenges. Through this thorough examination, insights into the evolving landscape of image captioning are aimed to be provided, highlighting avenues for further exploration and innovation.

Full Text