Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria

Chong Su,Jie Cao

doi:10.1007/s10489-018-1314-z

Abstract

Lazy decision tree (LazyDT) constructs a customized decision tree for each test instance, which consists of only a single path from the root to a leaf node. LazyDT has two strengths in comparison with eager decision trees. One is that LazyDT can build shorter decision paths than eager decision trees, and the other is that LazyDT can avoid unnecessary data fragmentation. However, the split criterion used for constructing a customized tree in LazyDT is information gain, which is skew-sensitive. When learning from imbalanced data sets, class imbalance impedes their ability to learn the minority class concept. In this paper, we use Hellinger distance and K-L divergence as split criteria to build two types of lazy decision trees. An experimental framework is performed across a wide range of imbalanced data sets to investigate the effectiveness of our methods when comparing with the other methods including lazy decision tree, C4.5, Hellinger distance based decision tree and support vector machine. In addition, we also use SMOTE to preprocess the highly imbalance data sets in the experiment and evaluate its effectiveness. The experimental results, which contrasted through nonparametric statistical tests, demonstrate that using Hellinger distance and K-L divergence as the split criterion can improve the performances of LazyDT for imbalanced classification effectively.

Full Text