Abstract
This review paper provides an overview of data pre-processing in Machine learning, focusing on all types of problems while building the machine learning problems. It deals with two significant issues in the pre-processing process (i). issues with data and (ii). Steps to follow to do data analysis with its best approach. As raw data are vulnerable to noise, corruption, missing, and inconsistent data, it is necessary to perform pre-processing steps, which is done using classification, clustering, and association and many other pre-processing techniques available. Poor data can primarily affect the accuracy and lead to false prediction, so it is necessary to improve the dataset's quality. So, data pre-processing is the best way to deal with such problems. It makes the knowledge extraction from the data set much easier with cleaning, Integration, transformation, and reduction methods. The issue with Data missing and significant differences in the variety of data always exists as the information is collected through multiple sources and from a real-world application. So, the data augmentation approach generates data for machine learning models. To decrease the dependency on training data and to improve the performance of the machine learning model. This paper discusses flipping, rotating with slight degrees and others to augment the image data and shows how to perform data augmentation methods without distorting the original data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have