Abstract

AbstractIn domain of data mining, learning from imbalanced class distribution datasets is a challenging problem for conventional classifiers. The class imbalance exists when the number of samples of one class is much lesser than the ones of the other classes. In real-world classification problems, data samples often have unequal class distribution. This problem is represented as a class imbalance problem. However, many solutions have been proposed in the literature to improve classifier performance. But recent works entitlement that imbalanced dataset is not a problem in itself. The degradation of classifier performance is also linked with many factors like small sample size, sample overlapping, class disjunct and many more. In this work, we proposed cluster-based under-sampling based on farthest neighbors. The majority class samples are selected based on the average distance to all minority class samples in the cluster are farthest. The experimental results show that our cluster-based under-sampling approach outperform with existing techniques in the previous studies.KeywordsClassificationClusteringClass disjunctImbalance problemsMajority samplesMinority samples

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call