Abstract
AbstractSeveral studies have demonstrated the high prediction accuracy of clustered credit risk modeling. In clustered modeling, borrowers are segmented based on their similarities through cluster analysis, and a separate predictive model is developed for each cluster, resulting in increased predictive accuracy. Unambiguously, its effectiveness depends on the quality of the segmentation, which in turn depends primarily on the choice of variables used in the cluster analysis. However, appropriate variable selection for clustering is a major challenge, particularly for high-dimensional data. In the present study, we propose a machine learning-based variable selection method based on theoretical and regulatory considerations. Formally, the most influential risk drivers from a best-in-class machine learning model are identified using Shapley values and employed as clustering variables. Thus, the information of the explanatory variables crucial for the prediction of the dependent variable is already processed during data segmentation, making each individual predictive model more effective. Through a comparative analysis using two real-world credit default datasets, we show that our proposed approach to clustered modeling leads to the highest prediction accuracy among various clustering models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.