Abstract
Automatic Image Caption Generation is one of the core problems in the field of Deep Learning. Data Augmentation is a technique which helps in increasing the amount of data at hand and this is done by augmenting the training data using various techniques like flipping, rotating, Zooming, Brightening, etc. In this work, we create an Image Captioning model and check its robustness on all the major types of Image Augmentation techniques. The results show the fuzziness of the model while working with the same image but a different augmentation technique and because of this, a different caption is produced every time a different data augmentation technique is employed. We also show the change in the performance of the model after applying these augmentation techniques. Flickr8k dataset is used for this study along with BLEU score as the evaluation metric for the image captioning model.
Highlights
Image Captioning is a complex but a very important task because it involves object understanding and the relation of these objects with their environments
It is observed that the Bleu scores of 443 images were increased after blurring and sending the images to the image captioning model
This work signifies the importance of data augmentation techniques and the extent to which they help increase the performance of the image captioning models
Summary
Image Captioning is a complex but a very important task because it involves object understanding and the relation of these objects with their environments. There has been a lot of research done on methods to improve the performance of the deep CNN models One such method is data augmentation [7,8]. Overfitting occurs when a neural network learns a function which perfectly models the training data, thereby becoming less generalized leading to poor performance on the training data This can be avoided by giving a lot of data for the model to train upon but that is not always the case with most of the real world problems as data collection can be time consuming and expensive. Even though data augmentation increases the amount of the training data and helps in increasing the performance of the model, these techniques are not usually used in this particular problem of image captioning mainly because of the two stage process. Analysis of the Fuzziness of Image Caption Generation Models due to Data Augmentation Techniques
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Recent Technology and Engineering (IJRTE)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.