Research on improvements of fraud detection system: basing on improved machine learning algorithms

Zhiding Zhang

doi:10.54254/2755-2721/27/20230146

Abstract

Nowadays, commercial fraud behaviors commonly occur in many industries. However, due to obstacles like concept drift, imbalanced dataset and uneven distribution of fraud entries, Fraud Detection System (FDS) fails to identify such behaviors. Among the problems mentioned above, most research focus on dealing with skewed dataset. This paper first presents common application scenarios of FDS which consist of credit card fraud, insurance fraud and supply chain fraud. Then, this study introduces five representative methods in dealing with problems mentioned above, which are K Nearest Neighbors-Synthetic Minority Oversampling Technique-Long Short-term Memory Networks (kNN-SMOTE-LSTM), Generative Adversarial Nets-AdaBoost-Decision tree (GAN-AdaBoost-DT), Wasserstein GAN-Kernel Density Estimation-Gradient Boosting DT (WGAN-KDE-GBDT), Time-LSTM (TLSTM) and Adaptive Synthetic Sampling-Sequential Forward Selection-Random Forest (ADASYN-SFS-RF). KNN-SMOTE-LSTM adopts KNN as an identifying classifier so as to only retain true samples. GAN-AdaBoost-DT generates new samples without referring to real transactions. WGAN-KDE-GBDT uses Wasserstein Distance as distance measurement instead, and thus improves training speed and guarantees successful generation. TLSTM tires to consider the weights of different time intervals and measures the similarity between the simulated behavior and the genuine behavior. ADASYN-SFS-RF employs SFS algorithm, basing on RF, to only reserve optimal subsets of features. Finally, result metrics prove that those improved algorithms do improve the overall performance of FDS, even if with limitations at some indicators.

Full Text