Bagging and random forest classification methods for unbalanced data school dropout cases in Lampung province

Dhery Setiawan,La Ode Abdul Rahman,Hari Wijayanto

doi:10.1063/5.0109130

Abstract

Classification modeling is currently growing and its use is often found in various fields of work. A lot of researches have been conducted to determine the best classification method in predicting a class of an observation. Most of it says that bagging and random forest methods are the best in predicting a class of observation. However, most of the classification methods will encounter a problem when it is used to modeling an unbalanced data. It is also known that the number of school dropout is relatively less than the number of students who are currently active, so this can be a case study of unbalanced data. The purpose of this research is to compare the performance of bagging and random forest method before handling unbalanced data and the bagging and random forest method after handling unbalanced data with Synthetic Minority Oversampling Technique (SMOTE). The comparison of performance can be seen from the sensitivity score, balanced accuracy, and F1 score of each classification method. The comparison results show that the random forest method has better performance than the bagging method, both before and after handling unbalanced data.

Full Text