Abstract

Detecting fraudulent activities in credit card transactions can be challenging due to issues like high dimensionality and class imbalance that are often present in the datasets. To address these challenges, data reduction techniques such as data sampling and feature selection have become essential. In this study, we compare four approaches for data reduction: using data sampling alone, employing feature selection alone, applying data sampling followed by feature selection, and using feature selection followed by data sampling. Additionally, we include results using all features. We build classification models using five Decision Tree-based classifiers and Logistic Regression, and evaluate their performance using two performance metrics: the Area Under the Receiver Operating Characteristic Curve (AUC) and the Area under the Precision–Recall Curve (AUPRC). In this work, we adopt ensemble supervised feature selection (SFS) techniques and Random Undersampling (RUS) for data reduction. The experimental results demonstrate that all four data reduction techniques have the potential to improve the performance of classifiers. These results are valuable since the classifiers available are dependent upon application domains, computing environments, and licensing agreements. However, these techniques can be applied independently of all these dependencies. We recommend utilizing the ensemble SFS followed by RUS (SFS–RUS) approach as the preferred data reduction method due to its ability to run feature selection and data sampling in parallel. Additionally, we find that XGBoost and CatBoost outperform other classifiers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call