Effectively identifying credit card fraud using machine learning methods is a significant issue in the financial sector. In this context, machine learning models encounter challenges such as the imbalanced distribution of sample data labels and the high dimensionality of customer feature sets. Addressing these two critical factors, this paper develops an enhanced method for the logistic regression model. This approach not only balances the sample label distribution through resampling but also mitigates the estimation issues arising from the curse of dimensionality. Furthermore, the proposed method addresses the coverage issue of the entire feature set. It solves that resampling can only partially address the curse of dimensionality problem and employs L1 regularization for each logistic regression submodel to further alleviate this issue. Results from simulation experiments and real-world data analysis demonstrate that the proposed method is competitive with logistic regression and several classical classification techniques. This method is not only effective in resolving credit card fraud risks but also has the potential to be extended to other domains.
Read full abstract