Handling Imbalanced Data for Credit Card Fraud Detection

Istiak Ahmed Mondal,Md Enamul Haque,Al-Maruf Hassan,Swakkhar Shatabda

doi:10.1109/iccit54785.2021.9689866

Abstract

With the rising trend in online transactions, the threat of financial fraud is also rising. This makes the necessity for an effective Fraud Detection System (FDS) more than ever before. To develop such a system the financial institutes are moving towards machine learning-based approaches due to their effectiveness. Machine learning-based systems need historical data to learn. As fraud cases take place rarely, the number of positive labeled classes in financial fraud datasets are very small and the datasets remain imbalanced. For this, the possibility for machine learning-based FDS to produce misleading results is high. To counter this problem Machine Learning (ML) researchers use multiple solutions from the perspective of data-level approach, algorithm-level approach, feature engineering, ensemble models, or any combination of them. In this paper, we propose to use Generative Adversarial Network (GAN) based synthetic data generation to handle the data imbalance problem followed by an ensemble classifier for classification. We have used a standard benchmark dataset of credit card fraud data. In our experiments, we have used both traditional oversampling/undersampling and GAN-based techniques from the data-level approach and investigated their effectiveness using ML algorithms and ensemble models. We have found Generative Adversarial Network (GAN) to be more effective and stable in performance compared to traditional oversampling techniques for both ML and ensemble models. Experiments also suggest that the combination of GAN-based sampling and ensemble models provides the best results. We also have found Synthetic Minority Oversampling Technique (SMOTE) to provide more stable results compared to Adaptive Synthetic Sample (ADASYN) from the traditional oversampling technique.

Full Text