Abstract

Imbalanced data classification aims to categorize the complex data that have a very different distribution in the number of data samples in its classes. Since the traditional classifiers do not consider the imbalanced class distribution, they exhibit poor behavior when faced with this kind of data. There are four main kinds of solutions to deal with this problem: modifying the data distribution, modifying the learning algorithm to consider the imbalanced representation, utilizing the costs of misclassifying the data samples, and ensemble methods. In this paper, we adopt the first type of solution to resample the imbalanced data. There are two phases for resampling, in the first phase, the data is over-sampled, and then the noisy and borderline samples are chosen and removed (under–sampling phase). In the under–sampling phase, we introduce two robust extensions of KNN classifiers based on the concepts of interval-valued fuzzy and intuitionistic fuzzy sets for imbalanced data to filter the noisy and borderline samples. The process of choosing and removing these samples is an iterative procedure (i.e., the Iterative Partitioning Filter). Thus, the proposed filter exploits the classifier ensemble. We propose a new rule for voting concerning the new interval-valued intuitionistic fuzzy-based classifiers. The characteristics of the proposed method are debated in a comprehensive experimental verification comparing by SMOTE and its most well–known generalizations. The experiments on both synthetic data sets with different levels of noise and shapes of borderlines as well as real–world data sets show the potentiality of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call