Classifier Ensemble Design for Imbalanced Data Classification: A Hybrid Approach

Uma R Salunkhe,Suresh N Mali

doi:10.1016/j.procs.2016.05.259

Abstract

Imbalanced learning for classification problems is the active area of research in machine learning. Many classification systems like image retrieval and credit scoring systems have imbalanced distribution of training data sets which causes performance degradation of the classifier. Re-sampling of imbalanced data is commonly used to handle imbalanced distribution as it is independent of the classifier being used. But sometimes they can remove necessary data of the class or can cause over-fitting. Classifier Ensembles have recently achieved more attention as effective technique to handle skewed data.The focus of the work is to gain advantages of both data level and classifier ensemble approach in order to improve the classification performance. We present a novel approach that initially applies pre-processing to the imbalanced dataset in order to reduce the imbalance between the classes. The pre-processed data is provided as training dataset to the classifier ensemble that introduces diversity by using different training datasets as well as different classifier models. The experimentation conducted on the eight imbalanced datasets from KEEL repository helps to prove the significance of the proposed method. A comparative analysis shows the performance improvement in terms of Area under ROC Curve (AUC).

Full Text