The class imbalance problem is one of the main challenges that hinders classifiers from identifying unknown instances. When class distribution imbalance and class overlap coexist in the same data set, the recognition of minority instances by traditional classifiers will be seriously affected. In the binary classification problem, the relationship between the majority class and the minority class is clear, and the overlap problem between classes can be well solved by removing the majority instances in the class-overlap area. However, for multi-class imbalance problems, the relationships between classes will become complex. Clustering techniques are effective methods for identifying overlapping instances; however, a single clustering method cannot effectively identify clusters of different shapes. Therefore, in this paper, a Heterogeneous Clustering Ensemble learning method for Multiple Class-overlap Detection (HCE-MCD) is proposed for multi-class imbalance problems. This method uses a genetic algorithm to select a combination of heterogeneous clustering techniques with the best fitness function from a pool of clustering algorithms and uses majority voting to fuse the results of the clustering algorithm. Furthermore, instances located near the decision boundary are more likely to be overlapping instances. Therefore, according to the clustering results of the heterogeneous clustering ensemble method, we can identify the overlapping instances existing in the data set, thereby eliminating the class overlap problem in the multi-class dataset. Experimental results on 19 open-access datasets show that our proposed method outperforms or partially outperforms state-of-the-art research protocols.
Read full abstract