A Survey on Data Augmentation Techniques

K Nanthini,D Sivabalaselvamani,K Chitra,P Gokul,S Kishore,S Kavinkumar

doi:10.1109/iccmc56507.2023.10084010

Abstract

Data augmentation is a technique used to generate new data sets from existing ones and artificially increase the size of a dataset. By providing the model with more training data, it is possible to enhance the performance of the proposed model. Various methods, such as image rotating, cropping, and flipping, and adding noise to audio signals can be used to enhance the data. When the original dataset is limited or unbalanced, the use of data augmentation can result in better generalization and increased performance for unknown data. This also reduces the over-fitting by increasing the number of training data included in the machine learning model. However, it is crucial to select the data augmentation methods that are used to be aware of how these methods will affect the models' performance on actual data. To deal with data scarcity and lack of diversity, computer vision and natural language processing (NLP) models employ data augmentation strategies. Accuracy of machine learning models can be improved further by employing AI/ML approaches like data augmentation. According to an experiment, deep learning model performs better by reducing the training loss and validation loss than a model without augmentation for image classification. This study has briefly discussed about the data augmentation techniques that are used to process the image, text and signal data by increasing the volume and variety of training data and set it as the primary goal for a machine learning model to perform better when it comes to generalization. By employing these techniques, over-fitting can be eliminated and the robustness and accuracy of the model can also be improved. The main objective of this study is to implement data augmentation as a solution for the problem of data scarcity. Data augmentation refers to a group of methods that are used to improve the amount and quality of training datasets so that more effective machine learning models may be constructed to process the data. The application of augmentation methods based on GANs are also covered in this study.

Full Text