Abstract
The evaluation of credit risk has become an indispensable element within the financial sector. This research aims to conduct a comparative examination of several machine learning model's performance in predicting credit risk. This research uses comprehensive metrics to give a comparative examination of six machine learning models, including Random Forests (RF) and Support Vector Machines (SVM). The features used in the training of these models were screened by a combination of Random Forest feature importance and Recursive Feature Elimination (RFE) to ensure model accuracy. After comparing the model results, the study concluded that the Random Forest model combined with RFE performed the best among all the risk columns with an accuracy of 0.71. KNN was the next best with an accuracy of 0.69. Logistic regression was the worst performer among the six models with an accuracy of only 0.29. In the study of this paper, the imbalance of the dataset categories resulted in a weak identification of moderate risk categories. It shows that the model is not well adapted to the dataset with imbalanced categories. The paper validates the viability of machine learning in credit risk by offering useful advice on how it may be applied. To further enhance prediction performance, future studies could investigate the combination of more advanced data-balancing strategies and deep learning approaches.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have