Decision tree algorithms have gained widespread acceptance in machine learning, with the central challenge lying in devising an optimal splitting strategy for node sample subspaces. In the context of continuous data, conventional approaches typically involve fuzzifying data or adopting a dichotomous scheme akin to the CART tree. Nevertheless, fuzzifying continuous features often entails information loss, whereas the dichotomous approach can generate an excessive number of classification rules, potentially leading to overfitting. To address these limitations, this study introduces an adaptive growth decision tree framework, termed the fuzzy neighborhood decision tree (FNDT). Initially, we establish a fuzzy neighborhood decision model by leveraging the concept of fuzzy inclusion degree. Furthermore, we delve into the topological structure of misclassified samples under the proposed decision model, providing a theoretical foundation for the construction of FNDT. Subsequently, we utilize conditional information entropy to sift through original features, prioritizing those that offer the maximum information gain for decision tree nodes. By leveraging the conditional decision partitions derived from the fuzzy neighborhood decision model, we achieve an adaptive splitting method for optimal features, culminating in an adaptive growth decision tree algorithm that relies solely on the inherent structure of real-valued data. Experimental evaluations reveal that, compared with advanced decision tree algorithms, FNDT exhibits a simple tree structure, stronger generalization capabilities, and superior performance in classifying continuous data.
Read full abstract