Abstract

Artificial intelligence research has long focused on the task of automatically defining what is included in a picture or image. This paper presents the CNN and RNN-LSTM models used in the Automatic Caption Generator implementation. It takes into account recent developments in image processing and performing research on translation through automation. The dataset utilized was Flickr8k. We have employed BLEU scores to assess the efficiency of the system that has been described. Based on the scores, one could classify the resulting captions as excellent or awful. The primary uses of this paradigm are virtual assistants, picture indexing, social media, assistive technology for the blind, altering app suggestions, and many more areas. Key Words: Automatic Caption Generator, CNN, RNN-LSTM, Computer vision, Machine translation, Flickr8k dataset, BLEU scores, Performance evaluation, Virtual assistants, Picture indexing, Social media, Assistive technology for the blind, App suggestions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call