PENERAPAN TEKNIK SAMPLING UNTUK MENGATASI IMBALANCE CLASS PADA KLASIFIKASI ONLINE SHOPPERS INTENTION

Ardiyansyah Ardiyansyah,Panny Agustia Rahayuningsih

doi:10.59697/jtik.v4i1.627

Abstract

Online shopping or e-commerce is a transaction process carried out through intermediary media in the form of online trading sites and social media that provide goods and services that are traded. Much research has focused on predicting realtime income for shopping web sites. The dataset consists of 10 numeric attributes and 8 category attributes. In this dataset, there is a possibility that there is an unbalanced target variable. Where, this is the case for each individual target value in the dataset. online shopper inttention aims to predict whether users generate revenue or not. Class imbalance occurs when the minority class is smaller than the majority class. Using unbalanced data will result in a minority class producing low accuracy values. Sampling methods are SMOTE, Undersampling and Oversampling To overcome the problem of class imbalance (imbalance class) as a measurement of performance. whereas, the classification algorithm method used is random forest, KNN, and Naive Bayes. The results of the evaluation and validation, it can be concluded that the best sampling method in overcoming the imbalance class in this study is the oversampling method. The random forest model without sampling has the highest f-measure value than the other models, which is 0.898. After applying the sampling method, the results of the comparison between the smote + random forest, undersampling + random forest and oversampling + random forest models. The best model with the highest f-measure and AUC is the oversampling + random forest model, the f-measure is 0.976 or 98% and the AUC value is 0.998. So the oversampling + random forest model is the best model in the study of the application of sampling techniques in overcoming the imbalance class in the online shopper intention enthusiast classification.

Full Text