Abstract

Learning from imbalanced datasets is a challenging topic and plays an important role in data mining community. Traditional splitting criteria such as information gain are sensitive to class distribution. In order to overcome the weakness, Hellinger Distance Decision Trees (HDDT) is proposed by Cieslak and Chawla. Despite HDDT outperforms the traditional decision trees, however, there may be other skew-insensitive splitting criteria. In this paper, we propose some new skew-insensitive splitting criteria which can be used in the construction of decision trees and applied a comprehensive empirical evaluation framework testing against commonly used sampling and ensemble methods, considering performance across 58 datasets. Based on the experimental results, we demonstrate the superiority of these skew-insensitive decision trees on the datasets with high imbalanced level and competitive performance on the datasets with low imbalanced level and K-L divergence-based decision tree (KLDDT) is the most robust among ...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call