Abstract

There is often a significant class imbalance in credit scoring datasets, mainly in portfolios of secured loans such as mortgage loans. A class imbalance occurs when the number of non-default cases outweighs the number of default cases. A naive classifier can achieve high accuracy by assigning all cases to the majority class; however, misclassifying the minority class is often costly. In XGBoost, a well-known and robust classification method, we propose that the quantile function of the generalized extreme value (GEV) distribution is used as a link function to enhance the detection of rare cases. To complement the GEV link function, the study applies a modified focal loss function in XGBoost to jointly penalize misclassification of the class of interest and focus on hard, tricky to classify cases. We test our proposal on a vast database of mortgage loans with rare default cases, available on the Freddie Mac website. As benchmarks, we also consider other common large credit scoring databases, existing extensions of XGBoost to handle classification imbalance and other state-of-the-art classification techniques for learning class-imbalanced data. According to the results, the proposed model has a superior predictive power to other competing models if the class imbalance is due to default events being outliers or rare in the dataset. We also demonstrate that the results will likely hold up in real-world situations and add business value under certain portfolio characteristics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call