Dynamic Stacked Ensemble with Entropy based Undersampling for the Detection of Fraudulent Transactions

Ramesh Naidu Laveti,Abhay Anand Mane,Supriya N Pal

doi:10.1109/i2ct51068.2021.9417896

Abstract

Fraud detection in finance such as finding fraudulent credit card or debit card transactions is still an open research problem and the solutions are evolving. In a financial transactions database, the number of fraudulent transactions will be very less which creates the class imbalance. Imbalanced data always causes serious challenges to a fraud detection model. Existing solutions such as undersampling and oversampling alleviate class imbalance problem but still they have lot of limitations. For example, oversampling demands significant computational time and undersampling suffer from loss of samples containing critical information related to majority class, in turn leads to poor generalization of the fraud detection model. We developed an entropy-based undersampling with dynamic stacked ensemble model for fraud detection, which we named as EUStack. To achieve undersampling, it evaluates the information content from each sample using Shannon entropy and selects the most informative subset of samples from the majority class. A two-level stacked ensemble is combined with this new undersampling method to improve the generalization performance of the fraud detection model. The credit card transactions dataset by “Worldline and the Machine Learning Group of ULB, hosted at Kaggle” was used to verify the robustness of EUStack. The dataset is highly imbalanced with 492 fraudulent transactions (0.172%) out of 284,807 transactions. EUStack was evaluated using F1 score and Matthews Correlation Coefficient (MCC). Experimental results demonstrate that it achieved high F1 (0.88) and MCC (0.88) scores when compared to the conventional undersampling based fraud detection methods.

Full Text