Abstract

Continuous attributes are hard to handle and require special treatment in decision tree induction algorithms. In this paper, we present a multisplitting algorithm, RCAT, for continuous attributes based on statistical information. When calculating information gain for a continuous attribute, it first splits the value range of the attribute into some initial intervals, computes the probability estimation of every class at each interval and finds the best threshold in the probability space, uses this threshold to separate the initial intervals into two sets, combines adjacent intervals in the same set, optimizes the boundary of every combined interval, and finally obtains the information gain of the continuous attribute. We also provide a pruning method to simplify the decision trees. Empirical results show that the RCAT algorithm can realise decision trees with much higher intelligibility than C4.5 while retaining their accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.