Abstract

The standard algorithms of decision trees and their derived methods are usually constructed on the basis of the frequency information. However, they still suffer from a dilemma or multichotomous question for continuous attributes when two or more candidate cut points have the same or similar splitting performance with the optimal value, such as the maximal information gain ratio or the minimal Gini index. In this paper, we propose a unified framework model to deal with this question. We then design two algorithms based on Splitting Performance and the number of Expected Segments, called SPES1 and SPES2, which determine the optimal cut point, as follows. First, several candidate cut points are selected based on their splitting performances being the closest to the optimal. Second, we compute the number of expected segments for each candidate cut point. Finally, we combine these two measures by introducing a weighting factor $\alpha $ to determine the optimal one from several candidate cut points. To validate the effectiveness of our methods, we perform them on 25 benchmark datasets. The experimental results demonstrate that the classification accuracies of the proposed algorithms are superior to the current state-of-the-art methods in tackling the multichotomous question, about 5% in some cases. In particular, according to the proposed methods, the number of candidate cut points converges to a certain extent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call