Binary classification based on a combination of rough set theory and decision trees

Dmytro Chernyshov,Dmytro Sytnikov

doi:10.30837/itssi.2023.26.087

Abstract

The subject of the study is to improve the accuracy and efficiency of classification algorithms using decision trees by integrating the principles of Rough Set theory, a mathematical approach to approximating sets. The aim of the study is to develop a hybrid model that integrates rough set theory with decision tree algorithms, thereby solving the inherent limitations of these algorithms in dealing with uncertainty in data. This integration should significantly improve the accuracy and efficiency of binary classification based on decision trees, making them more robust to different inputs. Research objectives include a deep study of possible synergies between approximate set theory and decision tree algorithms. For this purpose, we are conducting a comprehensive study of the integration of approximate set theory within decision tree algorithms. This includes the development of a model that utilizes the principles and algebraic tools of approximate set theory to more efficiently select features in decision tree-based systems. The model uses the theory of approximate sets to efficiently handle uncertainty and weighting, which allows for improved and extended feature selection processes in decision tree systems. A series of experiments are conducted on different datasets to demonstrate the effectiveness and practicality of this approach. These datasets are chosen to represent a range of complexities and uncertainties, providing a thorough and rigorous evaluation of the model's capabilities. The methodology uses advanced algebraic tools of approximate set theory, including the formulation of algebraic expressions and the development of new rules and techniques, to simplify and improve the accuracy of data classification processes using decision tree systems. The findings of the study are important because they show that integrating approximate set theory into decision tree algorithms can indeed provide more accurate and efficient classification results. Such a hybrid model demonstrates significant advantages in dealing with data with embedded uncertainty, which is a common challenge in many complementary scenarios. The versatility and effectiveness of the integrated approach is demonstrated by its successful application in the areas of credit scoring and cybersecurity, which emphasizes its potential as a versatile tool in data mining and machine learning. The conclusions show that integrating approximate set theory can lead to more accurate and efficient classification results. By improving the ability of decision trees to account for uncertainty and imprecision in data, the research opens up new possibilities for robust and sophisticated data analysis and interpretation in a variety of industries, from healthcare to finance and beyond. The integration of approximate set theory and decision trees is an important step in the development of more advanced, efficient, and accurate classification tools in the era of big data.

Full Text