Abstract

Imbalanced datasets pose frequent and challenging problems to many real-world applications. Classification models are often biased towards the majority class when learning from class-imbalanced data. Typical imbalanced learning (IL) approaches, e.g., SMOTE, AdaCost, and Cascade, often suffer from poor performance in complex tasks where class overlapping or a high imbalance ratio occurs. In this paper, we systematically investigate the IL problem and propose a novel framework named sparse projection infinite selection ensemble (SPISE). SPISE iteratively resamples balanced subsets and combines the classifiers trained on these subsets for imbalanced classification. The diversity of classifier ensembles and the similarity between the subsets and the whole dataset are considered in this process. Specifically, we present a graph-based approach named infinite subset selection to adaptively sample diverse and similar subsets. Additionally, a random sparse projection is combined with feature selection at the beginning of each iteration to augment the training features and enhance the diversity of the generated subsets. SPISE can be easily adapted to most existing classifiers (e.g., support vector machine and random forest) to boost their performance for IL. Quantitative experiments on 26 imbalanced benchmark datasets substantiate the effectiveness and superiority of the proposed model compared with other popular approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.