This study addresses the quantification of credit risk in solidarity economy entities, proposing a new methodology to redefine the concept of a “default” in the frequent situations of extreme class imbalances. The objective is to develop and evaluate credit scoring models that enhance risk management by incorporating internal and external data to assess default risk. Data mining techniques are applied to address class imbalances, redefining the term “default” to include external credit information and increasing the representation of the minority class. The effectiveness of machine learning and statistical models is evaluated using class-balancing methods such as under-sampling, over-sampling, and the Synthetic Minority Over-sampling Technique (SMOTE). The evaluation is based on the Balanced Accuracy metric and the holding power of the performance, ensuring a consistent predictive power of the model while avoiding overfitting. While machine learning methods can improve credit scoring, logistic regression-based models remain effective, especially when combined with class-balancing techniques. It is concluded that a balanced sample in a class size is essential to improve predictive performance.
Read full abstract