Abstract

In many real-world applications classification problems suffer from class-imbalance. The classification methods for imbalanced data with only data processing or algorithm improvement cannot get satisfied classification performance of the minority class. This paper proposes an ensemble classification method based on model dynamic selection driven by data partition hybrid sampling for imbalanced data. The method includes two core components: the generation of balanced datasets and the dynamic selection of classification models. At the data level a data partition hybrid sampling (DPHS) method is proposed to balance datasets. In particular the data space is divided into four regions according to the majority class proportion in minority class neighborhoods. Then we present a boundary minority class weighted over-sampling (BMW-SMOTE) method where the weight of each minority class instance is calculated by the ratio between the majority class proportion in the neighborhood of the current instance and the sum of all these proportions. The number of synthetic instances is determined by the weight. At the algorithm level we present a model dynamic selection (MDS) strategy. Three ensemble learning models are built. Among them the local regions reinforce and weaken model adopts the balanced dataset obtained by proposed DPHS method for training to strengthen the identification of test instances on the boundary and appropriately weakens the dense distribution of majority class. The model for each test instance is selected adaptively according to the imbalance degree of its neighbors. The experimental results show that the proposed method outperforms typical imbalanced classification methods for F-measure and G-mean.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call