Abstract

Class imbalance with overlap is a very challenging problem in electronic fraud transaction detection. Fraudsters have racked their brains to make a fraud transaction as similar as a genuine one in order to avoid being found. Therefore, lots of data of fraud transactions overlap with genuine transactions so that it is hard to distinguish them. However, most attention has been focused on class imbalance rather than overlapping issues for machine-learning-based methods of fraud transaction detection. This paper proposes a novel hybrid method to handle the problem of class imbalance with overlap based on a divide-and-conquer idea. Firstly, an anomaly detection model is trained on the minority samples for excluding both a few outliers of minority class and lots of majority samples from the original dataset. Then the remaining samples form an overlapping subset that has a low imbalance ratio and a reduced learning interference from both minority class and majority class than the original dataset. After that, this difficult overlapping subset is dealt with a non-linear classifier in order to distinguish them well. To achieve good properties of the overlapping subset, we propose a novel assessment criterion, Dynamic Weighted Entropy (DWE), to evaluate its quality. It is a specially designed trade-off between the number of excluded outliers of minority class and the ratio of class imbalance of overlapping subset. With the help of DWE, time consumption on searching good hyper-parameters is dramatically declined. Extensive experiments on Kaggle fraud detection dataset and a large real electronic transaction dataset demonstrate that our method significantly outperforms state-of-the-art ones.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.