Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

Tao Wang,Zhenxing Qin,Zhi Jin,Shichao Zhang

doi:10.1016/j.jss.2010.01.002

Abstract

Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software

Lead the way for us

Journal: Journal of Systems and Software	Publication Date: Jan 25, 2010
Citations: 47

Similar Papers

Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection
Jianping Fan ... Ling Gao
Pattern Recognition | VOL. 48
Jianping Fan, et. al.Jianping Fan ... Ling Gao
31 Oct 2014
Pattern Recognition | VOL. 48

Cost sensitive active learning based on self-training
Yongcheng Wu
-
Yongcheng WuYongcheng Wu
01 May 2014
01 May 2014

Cost-Time Sensitive Decision Tree with Missing Values
Shichao Zhang ... Jilian Zhang
-
Shichao Zhang, et. al.Shichao Zhang ... Jilian Zhang
07 Dec 2018
07 Dec 2018

Evaluation of Cost Sensitive Learning for Imbalanced Bank Direct Marketing Data
Khor Kok-Chin ... Ng Keng-Hoong
Indian Journal of Science and Technology | VOL. 9
Khor Kok-Chin, et. al.Khor Kok-Chin ... Ng Keng-Hoong
15 Nov 2016
Indian Journal of Science and Technology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software