SkewBoost: An algorithm for classifying imbalanced datasets

Saumil Hukerikar,Ashwin Tumma,Akshay Nikam,Vahida Attar

doi:10.1109/iccct.2011.6075185

Abstract

Many real world data sets have an imbalanced distribution of the instances. Learning from such data sets results in the classifier being biased towards the majority class, thereby tending to misclassify the minority class samples. In this paper, we provide a technique, SkewBoost which classifies the minority instances correctly without compromising much on the correct classification of the majority instances. In the SkewBoost technique, minority and majority instances are identified during execution of the boosting algorithm. A variation of SMOTE is used to create synthetic minority instances which are then added to the training set and total weight is rebalanced. After each iteration of the boosting algorithm, the weight of each instance is modified to focus more on the misclassified instances. A cost-sensitive approach has been adopted to reweight the instances following every iteration. This method is evaluated, in terms of the F-measure, G-mean, AUC, Recall and Precision, on imbalanced data sets against the results that have been published in the previous publications of algorithms on imbalanced datasets.

Full Text