Abstract

In previous studies, many researchers ignored the imbalanced class in a dataset. Imbalance data that is processed by classification with one of the labels in the data has a value that is much different from the other labels. Imbalance data has a rare class (data with a few labels) and an abundant class (data with many labels). This condition can affect the results when predicting the model or the results of the predictions made by the classification model because the model cannot distinguish between rare and abundant classes, so it will assume that all data labels are abundant classes. Several previous studies have stated that the presence of Imbalanced Handling can handle unbalanced classes in a dataset. Imbalance Handling is a way to handle imbalanced data so that rare classes and abundant classes are balanced and that the Predict Model results can be maximized. This study aims to show the Imbalanced Data's effect when carrying out a classification so that the classification results are not optimal and to find the proper Resampling method for Imbalanced Handling. The resampling methods used in this study are Random Over-Sampling (ROS), Random Under-Sampling (RUS), and SMOTE. The datasets that were tested in this study were car_insurance, employees_attrition, and telco_customers and then classified using several classifications, namely, Artificial Neural Network (ANN), Decision Tree, k-Nearest Neighbors (KNN), Support Vector Machine (SVM), Naïve Bayes and Stacking Ensemble Learning. In the car_insurance Dataset, the best Resampling results are Random Over-Sampling with KNN classification with an accuracy of 84%. For the employees_attrition Dataset, the best Resampling results are Random Over-Sampling with Decision Tree, KNN, and Stacking classifications, each of which has accurate results of 99%. For the telco_customers Dataset, the best resampling results are Random Over-Sampling with KNN classification, which has an accuracy of 84%. Of the three datasets tested, the best resampling method is Random Over-Sampling.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call