Abstract

In training a Decision Tree (DTr) the choice of the criterion according to which the selection of the attribute at the level of a node is made represents a key point. The Chi-Square (CS) measure is used in one of the best known DTr algorithms: CHAID. However, this criterion tends to favour attributes with more values. In this paper I try to show that a change in this criterion can improve its performance. I will present the results of experiments performed with DTr (unpruned and pruned) on seven databases. Along with the special importance of choosing the splitting criterion, the method of pruning DTr is also perhaps as important. For this, I wanted to highlight which of the three types of DTr: unpruned DTr, pessimistic pruned DTr or error based pruned DTr, has a better behaviour for problems of classification and prediction in Data Science field. The experiments exhibited in the paper show that the modified version of the CS criterion systematically achieves better performance of the classification error rate on the test data (CERTD). At the same time, the performances acquired by DTr pruning based on confidence intervals (error-based pruned) systematically exceed the performances of the other two variants of DTr.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.