Abstract

Image captioning is the process of automatically describing the content of an image which connects computer vision and natural language processing. In this paper, we compare five popular convolutional neural networks architecture. They are Vgg16, InceptionV3, Resnet50, Densenet201 and Xception Model. By using these preprocessing model for same image captioning model. Encoder-Decoder model is a part of recurrent neural networks for sequence-to-sequence prediction problems. Encoders are usually used pre-trained convolutional neural networks for large datasets. There are many different types of Encoder-Decoder architecture used for generating caption. But it is very complicated to evaluate the performance of the architecture. In this paper, we used categorical-crossentropy for loss function, RMSprop for optimizer in Vgg16, Resnet50, InceptionV3, Densenet201 and Xception model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call