Abstract

The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially useful majority class examples. To overcome this drawback, we adopt an unsupervised learning technique for supervised learning. We proposes cluster-based majority under-sampling approaches for selecting a representative subset from the majority class. Compared to under-sampling, cluster-based under-sampling can effectively avoid the important information loss of majority class. We adopt two methods to select representative subset from k clusters with certain proportions, and then use the representative subset and the all minority class samples as training data to improve accuracy over minority and majority classes. In the paper, we compared the behaviors of our approaches with the traditional random under-sampling approach on ten UCI repository datasets using the following classifiers: k-nearest neighbor and Naïve Bayes classifier. Recall, Precision, F-measure, G-mean and BACC (balance accuracy) are used for evaluating performance of classifiers. Experimental results show that our cluster-based majority under-sampling approaches outperform the random under-sampling approach. Our approaches attain better overall performance on k-nearest neighbor classifier compared to Naïve Bayes classifier.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.