Imbalanced Classification Problems: Systematic Study, Issues and Best Practices

Camelia Lemnaru,Rodica Potolea

doi:10.1007/978-3-642-29958-2_3

Abstract

This paper provides a systematic study of the issues and possible solutions to the class imbalance problem. A set of standard classification algorithms is considered and their performance on benchmark data is analyzed. Our experiments show that, in an imbalanced problem, the imbalance ratio (IR) can be used in conjunction with the instances per attribute ratio (IAR), to evaluate the appropriate classifier that best fits the situation. Also, MLP and C4.5 are less affected by the imbalance, while SVM generally performs poorly in imbalanced problems. The possible solutions for overcoming these classifier issues are also presented. The overall vision is that when dealing with imbalanced problems, one should consider a wider context, taking into account several factors simultaneously: the imbalance, together with other data-related particularities and the classification algorithms with their associated parameters.

Full Text