Abstract

Learning from class-imbalanced data is a tough task, which often leads classifiers to fail on identifying the minority class. To balance the class ratio, synthetic minority oversampling technique (SMOTE) has shown its improvement in classifying minority class by generating synthetic minority instances. However, in some scenarios, SMOTE and its extensions will generate noise instances and thus causing the performance degradation. This is because of that they were developed based on kNN (k nearest neighbors), which cannot identify the class distributions between pairs of two minority instances. Furthermore, the number of synthetic instances is left to be discussed in this field of study. To conquer these issues, we propose a new algorithm here named Region-Impurity Synthetic Minority Oversampling Technique (RIOT). Specifically, a region radius, we locate neighbors for minority instances and whereby to identify the relatively hard-to-learn minority instances, by the class ratio within the region and selecting building the base of sample generation. Then, generating synthetic instances until the region is approximately balanced. In the experiment, the results revealed that RIOT can perform better than some SMOTE extensions with less synthetic instances in terms of several model performance indicators for twelve real-world datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call