An accurate and interpretable credit scoring model plays a crucial role in helping financial institutions reduce losses by promptly detecting, containing, and preventing defaulters. However, existing models often face a trade-off between interpretability and predictive accuracy. Traditional models like Logistic Regression (LR) offer high interpretability but may have limited predictive performance, while more complex models may improve accuracy at the expense of interpretability. In this paper, we tackle the credit scoring problem with imbalanced data by proposing two new classification models based on the optimal classification tree with hyperplane splits (OCT-H). OCT-H provides transparency and easy interpretation with ‘if-then’ decision tree rules. The first model, the cost-sensitive optimal classification tree with hyperplane splits (CSOCT-H). The second model, the optimal classification tree with hyperplane splits based on maximizing F1-Score (OCT-H-F1), aims to directly maximize the F1-score. To enhance model scalability, we introduce a data sample reduction method using data binning and feature selection. We then propose two solution methods: a heuristic approach and a method utilizing warm-start techniques to accelerate the solving process. We evaluated the proposed models on four public datasets. The results show that OCT-H significantly outperforms traditional interpretable models, such as Decision Trees (DT) and Logistic Regression (LR), in both predictive performance and interpretability. On certain datasets, OCT-H performs as well as or better than advanced ensemble tree models, effectively narrowing the gap between interpretable models and black-box models.
Read full abstract