Abstract
Pruning is applied in order to combat over-fitting problem where the tree is pruned back with the goal of identifying decision tree with the lowest error rate on previously unobserved instances, breaking ties in favour of smaller trees with high accuracy. In this paper, pruning with Bayes minimum risk is introduced for estimating the risk-rate. This method proceeds in a bottom-up fashion converting a parent node of a subtree to a leaf node if the estimated risk-rate of the parent node for that subtree is less than the risk-rates of its leaf. This paper proposes a post-pruning method that considers various evaluation standards such as attribute selection, accuracy, tree complexity, and time taken to prune the tree, precision/recall scores, TP/FN rates and area under ROC. The experimental results show that the proposed method produces better classification accuracy and its complexity is not much different than the complexities of reduced-error pruning and minimum-error pruning approaches. The experiments also demonstrate that the proposed method shows satisfactory performance in terms of precision score, recall score, TP rate, FP rate and area under ROC.
Highlights
Decision tree is one of the most powerful and efficient techniques in data mining which has been widely used by researchers [1,2,3]
The experimental results show that the proposed method produces better classification accuracy and its complexity is not much different than the complexities of reduced-error pruning and minimum-error pruning approaches
The experiments demonstrate that the proposed method shows satisfactory performance in terms of precision score, recall score, True Positive (TP) rate, False Positive (FP) rate and area under Receiver Operating Characteristic (ROC)
Summary
Decision tree is one of the most powerful and efficient techniques in data mining which has been widely used by researchers [1,2,3]. We adopt post-pruning approach to combat the over-fitting problem that rises during data classification process and leads to a complex tree with large size and difficult to understand. To avoid this obstacle a new post-pruning method called Pruning with Bayes Minimum Risk (PBMR) is introduced in order to achieve high accuracy with reduced tree size. While post-pruning algorithms estimate the misclassification errors at each decision node, PBMR method estimates the risk-rate of a node and its leaf and propagates this error up the tree instead of estimating the misclassification errors. Pruning methods are introduced to combat this problem by removing the non-productive and meaningless branches to avoid the unnecessary tree complexity
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.