Abstract

Abstract: Credit card fraud is one of the important financial frauds that has caused a huge amount of financial losses more than the past. Several protection mechanisms such as Fraud Prevention Systems (FPSs) are used to combat the current credit card frauds, but these methods are not efficient enough to reduce the impact of these frauds. It is therefore important that credit card companies are able to recognize and prevent fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In this research, we present our approach to predict legitimate or fraudulent transactions on Credit Card Fraud Detection Dataset from Kaggle. This work is based on analysis at two levels using performance evaluation metrics such as Precision, Recall, F1_Score and ROC_AUC. In the first stage of the research, the original imbalanced and skewed dataset was used to train, predict and evaluate the six supervised machine learning classifiers considered in this research including: Extreme Gradient Boosting (XGBoost), Random Forest (RF), K-Nearest Neighbour (KNN), Decision Tree (DT), Logistic Regression (LR), and Naïve Bayes (NB) while the same set of classifiers were also trained, predicted and evaluated with the same dataset but now resampled using SMOTETomek, i.e., a combination of both under-sampling and over-sampling technique to eliminate the imbalanced nature of the dataset during the second stage. The results of the two stages are compared to select the best overall performance. However, the result of the second stage of the experiment where models are trained, tested and evaluated with resampled dataset gave the overall best results where XGBoost, RF and DT have 100% in Precision, Recall, F1_Score and ROC_AUC respectively. While comparing the overall results of our research with all the papers reviewed in Section 2 of this work, it is worth noting that our research achieved the best performance so far where 100% were recorded from three different classifiers in all the four metrics used to evaluate our work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call