Predicting Classifiers Efficacy in Relation with Data Complexity Metric Using Under-Sampling Techniques

Deepika Singh,Anjana Gosain,Anju Saha

doi:10.1007/978-981-16-3346-1_7

Abstract

In imbalanced classification tasks, the training datasets may suffer from other problems like class overlapping, small disjuncts, classes of low density, etc. In such a situation, the learning for the minority class is imprecise. Data complexity metrics help us to identify the relationship between classifier’s learning accuracy and dataset characteristics. This paper presents an experimental study for imbalanced datasets wherein dwCM complexity metric is used to group the datasets based on the complexity level, thereafter the behavior of under-sampling based pre-processing techniques are analyzed for these different groups of datasets. Experiments are conducted on 22 real life datasets with different levels of imbalance, class overlapping and density of the classes. The experimental results show that these groups formed using dwCM metric can better explain the difficulty of imbalanced datasets and help in predicting the response of the classifiers to the under-sampling algorithms.

Full Text