Abstract

Aim.Feature transformation is one of the stages of machine learning applicationthat has a significant effect on the quality of regression models. The paper aims to developcriteria for evaluating the quality of data dimensionality reduction at the stage of featuretransformation and adaptation of the UMAP method to the problem of prediction of the numberof days to failure in the locomotives of JSC RZD.Methods.The data transformation methodsare divided into two groups, those that attempt to preserve the global data structure, andthose that attempt to preserve the distances between points. The paper examines in detail theUMAP no-linear method of dimensionality reduction, whose low-dimensional data presentationis based on a transformation of a nearest neighbour graph retaining the data structure. Thestructure of the initial data manifold is examined using topological data analysis and simplifiedfuzzy set construction methods.Results.The analysis of UMAP theory conducted in theRussian language for the first time enabled a substantiated identification of the three primaryparameters of the method, whose variation significantly affects the type of data obtained asthe result of a transformation. In particular, that pertains to the quality of class separationover a two-dimensional space. Additionally, the characteristics of the input set of parameterswere identified that affect the UMAP results. Practical results of UMAP application weredemonstrated. Intermediate results included a list of nearest neighbours, a weighted graph of nearest neighbours. The fundamental result is a low-dimensional data representation (outof 44 initial measurements) over a two-dimensional space with class separation, which isconfirmed both by calculations, and visually.Conclusions.It was identified that UMAP is anefficient and substantiated method of dimensionality reduction that allows – through parametervariation – transforming data in such a way as to improve the quality of data submitted tomachine learning models by the criterion of “evident class separation”. The transformation is anintermediate stage of data preparation for regression model application, and class separationwas performed for the purpose of eliminating the probability of gross regression errors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call