Financial distress prediction using a corrected feature selection measure and gradient boosted decision tree

Hongyi Qian,Baohui Wang,Minghe Yuan,Songfeng Gao,You Song

doi:10.1016/j.eswa.2021.116202

Abstract

Corporate financial distress prediction research has been ongoing for more than half a century, during which many models have emerged, among which ensemble learning algorithms are the most accurate. Most of the state-of-the-art methods of recent years are based on gradient boosted decision trees. However, most of them do not consider using feature importance for feature selection, and a few of them use the feature importance method with bias, which may not reflect the true importance of features. To solve this problem, a heuristic algorithm based on permutation importance (PIMP) is proposed to modify the biased feature importance measure in this paper. This method ranks and filters the features used by machine learning models, which not only improves accuracy but also makes the results more interpretable. Based on financial data from 4,167 listed companies in China between 2001 and 2019, the experiment shows that compared with using the random forest (RF) wrapper method alone, the bias in feature importance is indeed corrected by combining the PIMP method. After the redundant features are removed, the performance of most machine learning models is improved. The PIMP method is a promising addition to the existing financial distress prediction methods. Moreover, compared with traditional statistical learning models and other machine learning models, the proposed PIMP-XGBoost offers higher prediction accuracy and clearer interpretation, making it suitable for commercial use.

Full Text