Abstract

The development of artificial intelligence systems necessitates the creation of methods to improve the quality of information processing. The values of target variables and predictors are formed under the influence of external and internal factors that affect their ranges. Phenomena such as concept drift lead to the fact that the model may lose completeness and accuracy of results over time. A method of automatic data segmentation based on clustering is proposed to improve the quality of machine learning algorithms. Classification models are trained on a variety of examples in which there may be outliers, noisy data, an imbalance of observed objects, which affect the qualitative indicators of the results. In many systems, the time intervals of the action of factors can be determined and formalized in advance. The sample generated for machine learning methods usually consists of tuples whose values are obtained under different conditions. The influence of influencing factors can be determined empirically, or automatically, using, for example, clustering methods. The main (difference) feature of the proposed method is that, based on information about the factors influencing the ranges of values of these factors, the sample is divided into subsamples. The selection of individual impacts makes it possible to determine the data areas in which the factor was influenced, evaluate their properties and assign a classifying algorithm that has the best characteristics of qualitative indicators. The results of experiments on a number of data sets are presented. They show that the proposed solution makes it possible to improve the quality of processing. The model can be considered as an improvement of ensemble methods of processing information flows and data samples. In the case of transformation of data properties, training a separate algorithm on a local segment makes it possible to reduce computational costs, compared with training a complex data processing model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.