Abstract

This paper is concerned with a comparative study of the most commonly used attribute selection measures in the construction of decision trees. We examine the effect of these measures on the resulting tree structures against various sampling policies. The emphasis of earlier works in this field has been on the overall size of the tree in terms of the number of levels and the number of leaf nodes. We take a more informative view, encompassing the functionality of decision trees into tree structures. The proposed evaluation criterion combines classification proportion with the combinatorial structure. Our experiments demonstrate that the information-based measures outperform the non-information based ones for unpruned trees against classification proportion thresholds and most sampling policies. Among the information-based measures, the information gain appears to be the best. Pruning improves the performance of statistics-based measures. We also show that there are optimal combinations between attribute selection measures and sampling policies regarding to the best achievable classification thresholds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call