Abstract

In binary classification, class-imbalance problem occurs when the number of samples in one class is much larger than that of the other class. In such cases, the performance of a classifier is generally poor on the minority class. Classifier ensembles are used to tackle this problem where each member is trained using a different balanced dataset that is computed by randomly undersampling the majority class and/or randomly oversampling the minority. Although the primary target of imbalance learning is the minority class, downsampling-based schemes employ the same minority sample set for all members whereas oversampling the minority is challenging due to its unclear structure. On the other hand, heterogeneous ensembles utilizing multiple learning algorithms have a higher potential in generating diverse members than homogeneous ones. In this study, the use of heterogeneous ensembles for imbalance learning is addressed. Experiments are conducted on 66 datasets to explore the relation between the heterogeneity of the ensemble and performance scores using AUC and F1 measures. The results obtained have shown that the performance scores improve as the number of classification methods is increased from one to five. Moreover, when compared with homogeneous ensembles, significantly higher scores are achieved using heterogeneous ones. Also, it is observed that multiple balancing schemes contribute to the performance scores of some homogeneous and heterogeneous ensembles. However, the improvements are not significant for either approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call