METHOD OF BUILDING ENSEMBLES OF MODELS FOR DATA CLASSIFICATION BASED ON DECISION CORRELATIONS

Tetyana SKRYPNYK,Myroslav STEBELETSKYI,Eduard MANZIUK,Ruslan BAHRIY

doi:10.31891/2307-5732-2022-315-6-224-233

Abstract

The scientific work highlights the problem of increasing the accuracy of binary classification predictions using machine learning algorithms. Over the past few decades, systems that consist of many machine learning algorithms, also called ensemble models, have received increasing attention in the computational intelligence and machine learning community. This attention is well deserved, as ensemble systems have proven to be very effective and extremely versatile in a wide range of problem domains and real-world applications. One algorithm may not make a perfect prediction for a particular data set. Machine learning algorithms have their limitations, so creating a model with high accuracy is a difficult task. If you create and combine several models by combining and aggregating the results of each model, there is a chance to improve the overall accuracy, this problem is dealt with by ensembling. The basis of the information system of binary classification is the ensemble model. This model, in turn, contains a set of unique combinations of basic classifiers – a kind of algorithmic primitives. An ensemble model can be considered as some kind of meta-algorithm, which consists of unique sets of machine learning (ML) classification algorithms. The task of the ensemble model is to find such a combination of basic classification algorithms that would give the highest performance. The performance is evaluated according to the main ML metrics in classification tasks. Another aspect of scientific work is the creation of an aggregation mechanism for combining the results of basic classification algorithms. That is, each unique combination within the ensemble consists of a set of basic models (harbingers), the results of which must be aggregated. In this work, a non-hierarchical clustering method is used to aggregate (average) the predictions of the base models. A feature of this study is to find the correlation coefficients of the base models in each combination. With the help of the magnitude of correlations, the relationship between the prediction of the classifier (base model) and the true value is established, as a result of which space is opened for further research on improving the ensemble model (meta-algorithm)

Full Text