In the past, analysts evaluated whether to offer loans to particular applicants using rule-based approaches. However, due to the sudden rise in applicants and a labor shortage, financial institutions have created quantitative methods of decision-making. Credit scoring models are constructed. In this essay, random forest model, support vector machine regression model and Probit model are performed and compared according to the dataset from a major U.S. credit cards company. The result demonstrates that while machine learning techniques can improve the efficiency and accuracy of credit risk assessment, it does face some problems and limitations. Random forest model is capable of handling high-dimensional data and is not complicated to run. However, database with fewer features or samples will have lower classification accuracy. Support vector machine regression model has high accuracy and prevents overfitting to some degree. It is sensitive to the choice of kernel parameters and regularization term. By testing how important Mill Ratio is, the Probit model produces more accurate results. However, the model is more complex than the other two. In future research, we propose to enhance and extend our work by using more artificial intelligence algorithms and evaluation metrics.
Read full abstract