Prescreening bank failures with K-means clustering: Pros and cons

Dror Parnes,Alper Gormus

doi:10.1016/j.irfa.2024.103222

Abstract

To study the merits of the popular K-means clustering technique while predicting failures of commercial banks, we contrast hereafter two forecasting systems. The first one contains two complementary stages, with unsupervised K-means clustering followed by logistic regression deployed over the three most hazardous clusters (out of five) formed. The second system incorporates logistic regression over the entire sample of banks. We find that the first prognostic system is relatively strict. It better identifies bank failures beforehand, but it also projects more potential failures among the solvent banks. The second system is more lenient. It does not identify in advance actual bank failures in many cases, yet it does not speculate failures for many solvent banks either. The second system achieves a slightly higher overall predictive power than the first system. The minor statistical disadvantage of the K-means clustering is observed mainly because of the scarcity of bank failures in practice. The K-means clustering prescreening stage intensifies the systemic costs associated with type-II errors (predicting failures for solvent banks), but it simultaneously reduces the systemic costs linked to type-I errors (not predicting failures for eventually failed banks). Overall, the K-means clustering prescreening technique has prospective economic advantages, as it assists in slashing the total simulated systemic costs.

Full Text