Combining Sampling and Ensemble Classifier for Multiclass Imbalance Data Learning

Mohd Shamrie Sainin,Faudziah Ahmad,Fairuz Adnan,Rayner Alfred

doi:10.1007/978-981-10-8276-4_25

Abstract

The aim of this paper is to investigate the effects of combining various sampling and ensemble classifiers on the prediction performance in addressing the multiclass imbalance data learning. This research uses data obtained from the Malaysian medicinal leaf images shape data and three other large benchmark datasets in which seven ensemble methods from Weka machine learning tool were selected to perform the classification task. These ensemble methods include the AdaboostM1, Bagging, Decorate, END, MultiboostAB, RotationForest, and stacking methods. In addition to that, five base classifiers were used; Naive Bayes, SMO, J48, Random Forest, and Random Tree in order to examine the performance of the ensemble methods. Two methods of combining the sampling and ensemble classifiers were used which are called the Resample with ensemble classifier and SMOTE with ensemble classifier. The results obtained from the experiments show that there is actually no single configuration that is “one design that fits all”. However, it is proven that when using the sampling and ensemble classifier which is coupled with Random Forest, the prediction performance of the classification task can be improved on the multiclass imbalance dataset.

Full Text