DEVELOPMENT OF A BINARY CLASSIFICATION MODEL BASED ON SMALL DATA USING MACHINE LEARNING METHODS

S.S Mikhaylova,N.V Grineva

doi:10.33693/2541-8025-2024-20-1-129-140

Abstract

Today, solutions to the problem of binary classification using machine learning find applications in a huge number of spheres of life, such as medicine, energy, marketing, agriculture, financial analytics, etc. This is a great opportunity for companies to gain new sources of profit and improve existing processes. Therefore, new solution methods are being actively developed, existing ones are being improved, and research is being conducted on the possibility of using machine learning to solve classification problems in various fields. The study of the effectiveness of using various machine learning methods, taking into account the existing problems of small data in solving the problem of binary classification, is very relevant due to the significant preponderance of developments towards Big Data. For small data, possible problems that affect the effectiveness of the trained model have been identified, and various options for solving these problems have been proposed. To assess the impact of small data problems on the quality of the trained model, a comparative analysis of the quality metrics of models trained on different variations of data processing was carried out. It is concluded that correct work with small data requires timely elimination of such data defects as class imbalance, outliers, etc. In the course of the study, the most significant quality metrics were selected to obtain a model for analyzing medical parameters. A comparative analysis of diabetes detection models based on preprocessed small data has been carried out. For the task under consideration, the stacking model was chosen as the best option for medical use. The results of the analysis showed that machine learning is able to show high efficiency in solving real problems of binary classification.

Full Text