Fraud Detection Using Large-scale Imbalance Dataset

Zainab Saad Rubaidi,Mohamed Ben Aouicha,Boulbaba Ben Ammar

doi:10.1142/s0218213022500373

Zainab Saad Rubaidi, Mohamed Ben Aouicha + Show 1 more

Open Access

https://doi.org/10.1142/s0218213022500373

Copy DOI

Abstract

In the context of machine learning, an imbalanced classification problem states to a dataset in which the classes are not evenly distributed. This problem commonly occurs when attempting to classify data in which the distribution of labels or classes is not uniform. Using resampling methods to accumulate samples or entries from the minority class or to drop those from the majority class can be considered the best solution to this problem. The focus of this study is to propose a framework pattern to handle any imbalance dataset for fraud detection. For this purpose, Undersampling (Random and NearMiss) and oversampling (Random, SMOTE, BorderLine SMOTE) were used as resampling techniques for the concentration of our experiments for balancing an evaluated dataset. For the first time, a large-scale unbalanced dataset collected from the Kaggle website was used to test both methods for detecting fraud in the Tunisian company for electricity and gas consumption. It was also evaluated with four machine learning classifiers: Logistic Regression (LR), Naïve Bayes (NB), Random Forest, and XGBoost. Standard evaluation metrics like precision, recall, F1-score, and accuracy have been used to assess the findings. The experimental results clearly revealed that the RF model provided the best performance and outperformed all other matched classifiers with attained a classification accuracy of 89% using NearMiss undersampling and 99% using Random oversampling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal on Artificial Intelligence Tools	Publication Date: Sep 16, 2022
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Fraud Detection Using Large-scale Imbalance Dataset

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools

Lead the way for us

Similar Papers

Comparative Analysis of Oversampling Techniques on Small and Imbalanced Datasets Using Deep Learning
Saqib Ul Sabha ... Muzafar Rasool Bhat
-
Saqib Ul Sabha, et. al.Saqib Ul Sabha ... Muzafar Rasool Bhat
18 Mar 2023
18 Mar 2023

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage
Jianxiang Tang ... Hongli Wan
BMC Medical Informatics and Decision Making | VOL. 22
Jianxiang Tang, et. al.Jianxiang Tang ... Hongli Wan
25 Oct 2022
BMC Medical Informatics and Decision Making | VOL. 22

Impact of Training Set Size and Lead Time on Early Tomato Crop Mapping Accuracy
Michele Croci ... Henri Blandinières
Remote Sensing | VOL. 14
Michele Croci, et. al.Michele Croci ... Henri Blandinières
11 Sep 2022
Remote Sensing | VOL. 14

Learning from imbalanced data sets with boosting and data generation
Hongyu Guo ... Herna L Viktor
ACM SIGKDD Explorations Newsletter | VOL. 6
Hongyu Guo, et. al.Hongyu Guo ... Herna L Viktor
01 Jun 2004
ACM SIGKDD Explorations Newsletter | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fraud Detection Using Large-scale Imbalance Dataset

Abstract

Talk to us

Similar Papers

More From: International Journal on Artificial Intelligence Tools