Abstract

In supervised classification, decision tree and rule induction algorithms possess the desired ability to build understandable models. The PART algorithm creates partially developed C4.5 decision trees and extracts a rule from each tree. Some of the criteria used by this algorithm can be modified to yield better results. In this work, we propose and compare 16 variants of the PART algorithm from the perspectives of discriminating capacity, complexity of the models, and the computational cost, for 36 real-world problems obtained from the UCI repository. The use of the Best-First optimization algorithm to find the next node to develop in a partial tree improves the results of the PART algorithm. The best-performing variant also ranks first when compared to the well-established C4.5 algorithm and a modified version of the CHAID decision tree induction algorithm that handles continuous features, which is also proposed in this paper. In order to study its performance in comparison to other rivals, this comparison of algorithms also includes the original PART algorithm. For all performance measures, we test the results for statistical significance using state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call