Abstract

Automatic Image Caption Generation is one of the core problems in the field of Deep Learning. Data Augmentation is a technique which helps in increasing the amount of data at hand and this is done by augmenting the training data using various techniques like flipping, rotating, Zooming, Brightening, etc. In this work, we create an Image Captioning model and check its robustness on all the major types of Image Augmentation techniques. The results show the fuzziness of the model while working with the same image but a different augmentation technique and because of this, a different caption is produced every time a different data augmentation technique is employed. We also show the change in the performance of the model after applying these augmentation techniques. Flickr8k dataset is used for this study along with BLEU score as the evaluation metric for the image captioning model.

Highlights

  • Image Captioning is a complex but a very important task because it involves object understanding and the relation of these objects with their environments

  • It is observed that the Bleu scores of 443 images were increased after blurring and sending the images to the image captioning model

  • This work signifies the importance of data augmentation techniques and the extent to which they help increase the performance of the image captioning models

Read more

Summary

I.INTRODUCTION

Image Captioning is a complex but a very important task because it involves object understanding and the relation of these objects with their environments. There has been a lot of research done on methods to improve the performance of the deep CNN models One such method is data augmentation [7,8]. Overfitting occurs when a neural network learns a function which perfectly models the training data, thereby becoming less generalized leading to poor performance on the training data This can be avoided by giving a lot of data for the model to train upon but that is not always the case with most of the real world problems as data collection can be time consuming and expensive. Even though data augmentation increases the amount of the training data and helps in increasing the performance of the model, these techniques are not usually used in this particular problem of image captioning mainly because of the two stage process. Analysis of the Fuzziness of Image Caption Generation Models due to Data Augmentation Techniques

RELATED WORK
III.METHODS AND METHODOLOGY
Effect of change in brightness on image captioning
V.CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call