EASY ENSEMMBLE WITH RANDOM FOREST TO HANDLE IMBALANCED DATA IN CLASSIFICATION

Sarini Abdullah,Gv Prasetyo

doi:10.14710/jfma.v3i1.7415

Sarini Abdullah, Gv Prasetyo

Open Access

https://doi.org/10.14710/jfma.v3i1.7415

Copy DOI

Abstract

Imbalanced data might cause some issues in problem definition level, algorithm level, and data level. Some of the methods have been developed to overcome this issue, one of state-of-the-art method is Easy Ensemble. Easy Ensemble was claimed can improve model performance to classify minority class, and overcome the deficiency of random under- sampling. In this paper we discussed the implementation of Easy Ensemble with Random Forest Classifiers to handle imbalance problem in credit scoring case. This combination method is implemented in two datasets which taken from data science competition website, finhacks.id and kaggle.com with class proportion within majority and minority is 70:30 and 94:6. The results showed that resampling with Easy Ensemble can improve Random Forest classifier performance upon minority class. Recall on minority class increased significantly after the resampling. Before resampling, the recall on minority class for the first dataset (finhacks.id) was 0.49, and increased to 0.82 after the resampling. Similar results were obtained for the second data set (kaggle.com), where the recall for the minority class was increased from just 0.14 to 0.73.

Highlights

In a real-world problem, cases with imbalanced data are common; for example, in medical case which classify breast cancer type [1], cervical cancer [2], and lung cancer [3]
In addition to its high classification accuracy, random forest is considered as variable selection tool, which improves the performance of the predicting model [10,11,12]. This approach might be motivated due to the robustness of the result of random forest, where the selected important variables should come as a result of their consistency in the splitting rule when they were chosen in the random feature selection in generating a tree for each new bootstrap data. Considering these studies, we propose the use of random forest for classification in this study
We propose the Easy Ensemble method as an imbalance learning to handle imbalance problem in classification with Random Forest as a classifier

Summary

Introduction

In a real-world problem, cases with imbalanced data are common; for example, in medical case which classify breast cancer type [1], cervical cancer [2], and lung cancer [3]. In financial case, imbalanced data problems are found, such as credit scoring classification [4] and fraud detection [5]. Imbalanced data may cause problem in building a model, output of the classification model tends to predict majority class. The last data generated was that with the heavily imbalanced proportion between the two classes, that is at 5: 95

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Fundamental Mathematics and Applications (JFMA)	Publication Date: Jun 10, 2020
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

EASY ENSEMMBLE WITH RANDOM FOREST TO HANDLE IMBALANCED DATA IN CLASSIFICATION

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Fundamental Mathematics and Applications (JFMA)

Lead the way for us

Similar Papers

OBGAN: Minority oversampling near borderline with generative adversarial networks
Wonkeun Jo ... Dongil Kim
Expert Systems With Applications | VOL. 197
Wonkeun Jo, et. al.Wonkeun Jo ... Dongil Kim
26 Feb 2022
Expert Systems With Applications | VOL. 197

Data mining from extreme data sets: very large and/or very skewed data sets
L.O Hall
-
L.O HallL.O Hall
07 Oct 2001
07 Oct 2001

To combat multi-class imbalanced problems by means of over-sampling and boosting techniques
Lida Abdi ... Sattar Hashemi
Soft Computing | VOL. 19
Lida Abdi, et. al.Lida Abdi ... Sattar Hashemi
30 Apr 2014
Soft Computing | VOL. 19

SMOTEBoost: Improving Prediction of the Minority Class in Boosting
Nitesh V Chawla ... Lawrence O Hall
-
Nitesh V Chawla, et. al.Nitesh V Chawla ... Lawrence O Hall
01 Jan 2003
01 Jan 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EASY ENSEMMBLE WITH RANDOM FOREST TO HANDLE IMBALANCED DATA IN CLASSIFICATION

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Fundamental Mathematics and Applications (JFMA)