Abstract

With the rise of deep learning in recent years, the merging of computer vision and natural language processing has inspired a huge interest. Image captioning is an unique deep learning application that has advanced rapidly in recent years. The ability to train a computer to properly describe the content of an image or an environment like a person has huge implications in computer vision, economics, and a variety of other fields. Across the year, this has been a difficult issue in the field of artificial intelligence, and many researchers have made tremendous progress. In this paper, we describe various deep neural network-based picture caption generation models, including RNN-based decoding and CNN-based accessing and Reinforcement-based framework. We also created captions for some of the images and compared the various feature extraction and encoder models to see which one provides the best accuracy and produces the desired outcome. Extensive experiments on the dataset reveal that the proposed framework outperforms the existing encoder based approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.