Abstract
Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.
Highlights
Decision trees, a class of supervised learning algorithms, are widely used for addressing classification problems in data mining and machine learning
The results showed that all the tuning techniques were at par with each other, but performed significantly better as compared to the decision tree models generated by using the default hyper parameters of J48 algorithm
The performance of decision tree classifiers generated by HEARpart was tested on 30 datasets
Summary
A class of supervised learning algorithms, are widely used for addressing classification problems in data mining and machine learning. Like other machine learning algorithms, for optimal performance of decision tree classifiers, there is a need to set its several hyper parameters, such as the maximum depth of the tree, the minimum number of instances at a node for inducing a split, the splitting criterion, and the complexity parameter that controls the amount of pruning. The suggested approach tunes the four hyper parameters of recursive partitioning and regression tree: i) the minimum number of instances that must exist in a node for a split; ii) maximum depth of the tree; iii) splitting criteria; and iv) complexity parameter to control the amount of pruning The motivation of this experimental research is to improve accuracy and reduce the size of the decision trees classifier. The last section concludes the research and points toward the novel research directions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Information and Communication Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.