Abstract

As a provider of loans to SMEs, banks should prudently examine loan risks while ensuring that they provide loans to SMEs from the perspective of cooperating with policy implementation and controlling their own risks. The existing loan risk measurement tools include multiple discriminant analysis models, multiple regression models, and machine learning methods. Most machine learning methods have higher prediction accuracy than traditional models when using historical data for calculation, but the existence of problems such as overfitting seriously affects the robustness of machine learning methods. A similar method is introduced into the loan default risk prediction of SMEs, and the mean clustering method is used to preset penalty items to reduce overfitting and high accuracy to help banks effectively identify the default probability of SMEs during the loan period. This study will use the mean clustering method to iteratively train 900,000 SME credit records published by the US Small and Medium Business Administration, with 27 dimensions of data provided by Small Business Administration (SBA) to provide partial guarantees. A regression tree evaluates the data, combining the scores of multiple regression trees to produce a final prediction of the probability of credit default on the input data. The research results show that the mean clustering method can effectively improve the prediction accuracy of traditional machine learning methods and multiple linear regression in the scenario of SME loan default prediction and reduce the overfitting and black-box properties. As a supplementary loan default risk measurement tool, it can strengthen the ability of commercial banks to control the risk of loan business and can also promote the development of small- and medium-sized enterprises and the market economy to a certain extent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call