Abstract

With the advent of deep learning in recent years, the integration of computer vision with natural language processing has garnered a lot of attention. Generating descriptions from images is one of the most intriguing and focused areas of Machine Learning that faces a number of obstacles, particularly when describing images in languages other than English. In image captioning, a computer is trained to understand the visual information of an image and then generate a description based on the image features and reference sentences. Having an application that automatically describes events in their environment and then translates them into a caption or message can make a significant contribution to society. This paper presents a multi-layered CNN-LSTM neural network model that is utilized to recognize and generate Hindi captions for the objects in images. In addition, a variety of models were trained by adjusting hyperparameters and the number of hidden layers to find the optimum model and maximize the likelihood of the resultant Hindi description. Moreover, after testing the effectiveness of our models, it was observed that our model has shown an increase of 34.64% and 29.13% in terms of BLEU score (Unigram) and BLEU score (Bigram) respectively when compared to the existing work done in this field. Image captioning in Hindi can have a multitude of applications in today's society, and it can also provide a user-friendly interface for Hindi speakers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call