Abstract

Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.

Highlights

  • Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. It is present in many application areas like rare medical diagnosis, risk management, fault-detection, etc

  • We propose an effective ensemble method to undersample the majority class and solve the imbalanced data classification problem

  • K-Means algorithm is implemented by using the R package “Cluster” and Support vector machine is implemented by using the R package “e1071”

Read more

Summary

Introduction

Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. It is present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. Some of the imbalanced data classification applications are, identifying infected people with a rare disease among the healthy people, detection of oil spills in satellite radar images, detection of fraudulent calls, detection of fraudulent credit cards transactions and detection of intrusion detection, etc. The Imbalanced data have a skewed data distribution where the classes contain instances in different ratios. The imbalance ratio between the classes is represented with the class imbalance degree. The class which contains lesser instances is called as minority class and the class which contains more instances is called majority class

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call