Hybrid Architecture using CNN and LSTM for Image Captioning in Hindi Language

Ayush Kumar Poddar,Dr Rajneesh Rani

doi:10.1016/j.procs.2023.01.049

Ayush Kumar Poddar, Dr Rajneesh Rani

Open Access

https://doi.org/10.1016/j.procs.2023.01.049

Copy DOI

Abstract

With the advent of deep learning in recent years, the integration of computer vision with natural language processing has garnered a lot of attention. Generating descriptions from images is one of the most intriguing and focused areas of Machine Learning that faces a number of obstacles, particularly when describing images in languages other than English. In image captioning, a computer is trained to understand the visual information of an image and then generate a description based on the image features and reference sentences. Having an application that automatically describes events in their environment and then translates them into a caption or message can make a significant contribution to society. This paper presents a multi-layered CNN-LSTM neural network model that is utilized to recognize and generate Hindi captions for the objects in images. In addition, a variety of models were trained by adjusting hyperparameters and the number of hidden layers to find the optimum model and maximize the likelihood of the resultant Hindi description. Moreover, after testing the effectiveness of our models, it was observed that our model has shown an increase of 34.64% and 29.13% in terms of BLEU score (Unigram) and BLEU score (Bigram) respectively when compared to the existing work done in this field. Image captioning in Hindi can have a multitude of applications in today's society, and it can also provide a user-friendly interface for Hindi speakers.

Full Text