Abstract

This paper introduces new flexible loss functions for binary classification in Gradient-Boosted Decision Trees (GBDT) that combine Dice-based and cross-entropy-based losses and offer link functions from either a generalized extreme value (GEV) or exponentiated exponential logistic (EEL) distribution. Testing 27 different GBDT models using XGBoost on a Freddie Mac mortgage loan database showed that the choice of the loss function is useful. Specifically, when the class imbalance ratio (IR) is less than 99, using a skewed GEV distribution-based link function in XGBoost enhances discriminatory power and classification accuracy while retaining a simple model structure, which is particularly important in credit scoring applications. In cases where class imbalances are severe, typically between IRs of 99 and 200, we found that an advanced loss function, which is composed of a symmetric hybrid loss function and a link derived from a positively skewed EEL distribution, outperforms other XGBoost variants. Based on our findings, the accuracy improvements of these proposed extensions result in lower misclassification costs, which are especially evident when IR is below 99, which results in higher profitability for the business. Furthermore, the study highlights the transparency associated with GBDT, which is also an integral component of financial applications. Researchers and practitioners can use these insights to create more accurate and discriminative machine learning models, with possible extensions to other GBDT implementations and machine learning techniques that take into account loss functions. The source code for the proposed approach is publicly available at https://github.com/jm-ml/flexible-losses-for-binary-classification-with-GBDT.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call