The old-fashioned logistic regression is still the most used method for credit scoring. Recent developments have evolved new instruments coming from the machine learning approach, including random forests.In this paper, we tested the efficiency of logistic regression and XGBoost methods for default forecasting on a sample of 35,535 cases from 7 different business sectors of Italian SMEs, on a set of 28 banking variables and 55 balance sheet ratios for verifying which approach is better supporting the lending decisions.With this aim, we developed an efficiency index for measuring each model's capability to correctly select good borrowers, balancing the different effects of refusing the loan to a good customer and lending to a defaulter. Also, we computed the balancing spread to quantify the different models' efficiency in terms of credit costs for the borrower firms.Results show that different sectors report different results. However, generally speaking, the two methods report similar capabilities, while the cutoff setting can make a substantial difference in the actual use of those models for lending decisions.
Read full abstract