Abstract

The paper presents the procedure of identification and processing of data anomalies at the stage of preliminary data processing in machine learning tasks. The procedure consists of three stages. At the first stage, emissions are detected in the data samples. A large number of methods are used for this. The choice of a particular method depends on the task of machine learning, the structure of the data set and the types of data being processed. The methods used at this stage are methods of statistical tests, methods of metric tests, methods of model tests, iterative methods, methods of machine learning, ensemble methods. Until the second stage, the analysis of the causes of emissions is carried out. The causes of emissions include: causes of measurement errors and causes of data processing errors, the results of external influences, or errors in data records. In the third stage, there is a final processing of data sets with emissions, in which there is a removal of emissions or normalizing transformations. The effectiveness of the procedure was tested on different data sets.

Highlights

Read more

Summary

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.