Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets

G Rekha,Amit Kumar Tyagi

doi:10.1007/978-3-030-49339-4_5

Abstract

AbstractIn domain of data mining, learning from imbalanced class distribution datasets is a challenging problem for conventional classifiers. The class imbalance exists when the number of samples of one class is much lesser than the ones of the other classes. In real-world classification problems, data samples often have unequal class distribution. This problem is represented as a class imbalance problem. However, many solutions have been proposed in the literature to improve classifier performance. But recent works entitlement that imbalanced dataset is not a problem in itself. The degradation of classifier performance is also linked with many factors like small sample size, sample overlapping, class disjunct and many more. In this work, we proposed cluster-based under-sampling based on farthest neighbors. The majority class samples are selected based on the average distance to all minority class samples in the cluster are farthest. The experimental results show that our cluster-based under-sampling approach outperform with existing techniques in the previous studies.KeywordsClassificationClusteringClass disjunctImbalance problemsMajority samplesMinority samples

Full Text