Imbalanced datasets pose frequent and challenging problems to many real-world applications. Classification models are often biased towards the majority class when learning from class-imbalanced data. Typical imbalanced learning (IL) approaches, e.g., SMOTE, AdaCost, and Cascade, often suffer from poor performance in complex tasks where class overlapping or a high imbalance ratio occurs. In this paper, we systematically investigate the IL problem and propose a novel framework named sparse projection infinite selection ensemble (SPISE). SPISE iteratively resamples balanced subsets and combines the classifiers trained on these subsets for imbalanced classification. The diversity of classifier ensembles and the similarity between the subsets and the whole dataset are considered in this process. Specifically, we present a graph-based approach named infinite subset selection to adaptively sample diverse and similar subsets. Additionally, a random sparse projection is combined with feature selection at the beginning of each iteration to augment the training features and enhance the diversity of the generated subsets. SPISE can be easily adapted to most existing classifiers (e.g., support vector machine and random forest) to boost their performance for IL. Quantitative experiments on 26 imbalanced benchmark datasets substantiate the effectiveness and superiority of the proposed model compared with other popular approaches.
Read full abstract