Abstract

The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in machine learning (ML) algorithms and deep reinforcement learning (DRL) used for CCF detection systems, including fraud and non-fraud labels. Based on two resampling approaches, SMOTE and ADASYN are used to resample the imbalanced CCF dataset. ML algorithms are, then, applied to this balanced dataset to establish CCF detection systems. Next, DRL is employed to create detection systems based on the imbalanced CCF dataset. The diverse classification metrics are indicated to thoroughly evaluate the performance of these ML and DRL models. Through empirical experiments, we identify the reliable degree of ML models based on two resampling approaches and DRL models for CCF detection. When SMOTE and ADASYN are used to resampling original CCF datasets before training/test split, the ML models show very high outcomes of above 99% accuracy. However, when these techniques are employed to resample for only the training CCF datasets, these ML models show lower results, particularly in terms of logistic regression with 1.81% precision and 3.55% F1 score for using ADASYN. Our work reveals the DRL model is ineffective and achieves low performance, with only 34.8% accuracy.

Highlights

  • In the fourth industrial revolution, the e-commerce platform has become the most extensive system for financial institutions

  • In [2], the SMOTE-edited nearest neighbor (ENN) method was found to be best for detecting the credit card fraud (CCF) compared with other different classifiers among a set of oversampling approaches, and the SMOTE-Tomek’s Links (TL) showed good outcomes according to the set of under-sampling techniques

  • In order to provide various solutions to deal with the imbalanced CCF dataset, all results related to the machine learning (ML) algorithms and the deep reinforcement learning (DRL) approach based on the imbalanced CCF

Read more

Summary

Introduction

In the fourth industrial revolution, the e-commerce platform has become the most extensive system for financial institutions. The agent eventually searches for an optimal classification policy in the dataset based on a specific reward function and a beneficial learning environment. We develop the comparison of the performance of the DRL approach applied directly to the CCF imbalanced dataset and ML classifiers based on the resampling of the CCF dataset in order to analyze the contributions and limitations of the models related to the ML field for the CCF detection systems. SMOTE and ADASYN techniques are used to resample this imbalanced CCF dataset based on two resampling approaches. Applying the seven ML algorithms, i.e., KNN, LR, DT, RF, AdaBoost, XGBoost, and DNN, to the balanced CCF dataset obtained based on two resampling approaches in order to establish the CCF detection systems. We propose suitable algorithms for dealing with the imbalanced dataset effectively for the CCF detection systems.

Related Work
Dataset
Approaches for Imbalanced Datasets
Resampling Techniques
Machine Learning Algorithms
KKNearest
Logistic Regression
Decision Tree
Random Forest
AdaBoost
XGBoost Algorithm
Evolution
Classification Metrics Performance
Evaluation Measures
Combined Evaluation Metrics
Graphical Evaluation Performance
Imbalanced Classification Markov Decision Process
Imbalanced
Deep Q-Learning for ICMDP Subsection
Influence of Reward Function
Results
Results of Machine Learning Models Based on Imbalanced CCF Dataset
Results of Machine Learning Models Based on Resampling on Approach 1
12. Fundamental
14. Combined
10.15. Likelihood Ratio
Results of Machine
21. Combined
Results of Deep Reinforcement for Imbalanced CCF Dataset
Conclusions and Future Works
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call