Abstract

Pruning is applied in order to combat over-fitting problem where the tree is pruned back with the goal of identifying decision tree with the lowest error rate on previously unobserved instances, breaking ties in favour of smaller trees with high accuracy. In this paper, pruning with Bayes minimum risk is introduced for estimating the risk-rate. This method proceeds in a bottom-up fashion converting a parent node of a subtree to a leaf node if the estimated risk-rate of the parent node for that subtree is less than the risk-rates of its leaf. This paper proposes a post-pruning method that considers various evaluation standards such as attribute selection, accuracy, tree complexity, and time taken to prune the tree, precision/recall scores, TP/FN rates and area under ROC. The experimental results show that the proposed method produces better classification accuracy and its complexity is not much different than the complexities of reduced-error pruning and minimum-error pruning approaches. The experiments also demonstrate that the proposed method shows satisfactory performance in terms of precision score, recall score, TP rate, FP rate and area under ROC.

Highlights

  • Decision tree is one of the most powerful and efficient techniques in data mining which has been widely used by researchers [1,2,3]

  • The experimental results show that the proposed method produces better classification accuracy and its complexity is not much different than the complexities of reduced-error pruning and minimum-error pruning approaches

  • The experiments demonstrate that the proposed method shows satisfactory performance in terms of precision score, recall score, True Positive (TP) rate, False Positive (FP) rate and area under Receiver Operating Characteristic (ROC)

Read more

Summary

Introduction

Decision tree is one of the most powerful and efficient techniques in data mining which has been widely used by researchers [1,2,3]. We adopt post-pruning approach to combat the over-fitting problem that rises during data classification process and leads to a complex tree with large size and difficult to understand. To avoid this obstacle a new post-pruning method called Pruning with Bayes Minimum Risk (PBMR) is introduced in order to achieve high accuracy with reduced tree size. While post-pruning algorithms estimate the misclassification errors at each decision node, PBMR method estimates the risk-rate of a node and its leaf and propagates this error up the tree instead of estimating the misclassification errors. Pruning methods are introduced to combat this problem by removing the non-productive and meaningless branches to avoid the unnecessary tree complexity

Motivation
Related works
Experimental results and discussions
Conclusion
Future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call