Comparing SVM ensembles for imbalanced datasets

Vasudha Bhatnagar,Manish Bhardwaj ,A Mahabal

doi:10.1109/isda.2010.5687191

Abstract

Real life datasets often suffer from the problem of class imbalance, which thwarts supervised learning process. In such data sets examples of positive (minority) class are significantly less than those of negative (majority) class leading to severe class imbalance. Constructing high quality classifiers for such imbalanced training data sets is one of the major challenges in machine learning, since traditional classification algorithms tend to get biased towards majority class. In this paper, we compare three ensemble based approaches for handling imbalanced datasets. All the three approaches aim to overcome the under representation of minority class by learning from each of the minority class samples and a subset of majority class samples. The three approaches namely, PARTEN, UMjC and LFM were evaluated on several public datasets. Precision, recall, F- measure, g-mean and ROC space measures were used for comparison. Thread-bare discussion of the results is presented in the paper. Subsequently, we present an astronomy application, where the three methods are compared for prediction of class II, IIn and IIp supernovae.

Full Text