In credit evaluation, feature selection and grouping effect analysis are used to identify the most relevant credit risk features. Most feature selection and grouping effect analysis are implemented via regularizing linear models. Nevertheless, substantial evidence shows that credit data are linearly inseparable due to heterogeneous credit customers and various risk sources. Although many nonlinear models have been proposed in the last two decades, the majority of them required recombination of the original features, which made it difficult to interpret the results of the models. To cope with this dilemma, we propose a diagonal distance metric learning model that improves distance metrics by rescaling the features. Meanwhile, feature selection and grouping effect analysis are realized by adding regularizations to the model. The main merit of the proposed model is that it avoids the limitation of the linear models by not pursuing linear separability, yet guaranteeing the interpretability. We also prove and explain why feature selection and grouping effect can be achieved and decompose the optimization problem into parallel linear programming problems, plus a small quadratic consensus-reaching problem, such that the optimization can be efficiently solved. Experiments using a real credit data set of 96,000 instances show that the proposed model improves the area under the receiver operating characteristic curve (AUC) of the distance-based classifier k-nearest neighbors by 14% in two-class credit evaluation and surpasses linear models in terms of accuracy, true positive rate, and AUC. The proposed regularized diagonal distance metric learning approach also has the potential to be applied to other fields where data are linearly inseparable. History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning. Funding: Financial support from the National Social Science Fund of China [Grant 23BTJ040] and the National Natural Science Foundation of China [Grants 72471047 and 71910107002] is gratefully acknowledged. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( ) as well as from the IJOC GitHub software repository ( ). The complete IJOC Software and Data Repository is available at .
Read full abstract