Abstract

Due to the global financial crisis occurred in 2008, with a large amount of companies troubling in financial distress, the machine learning-based prediction of this dilemma has shown economic stakeholders’ great practicability. In the field of machine learning, most of the previous studies only focus on the improvement of the imbalanced datasets sampling methods or the introduction of multiple classifiers in the constructing stage for prediction model. In view of this, this paper attempts to improve the scope and depth of ensemble to achieve better prediction performance for a severely imbalanced dataset of financial data of Chinese listed companies. For the first time, this paper combines the clustering-based under-sampling (CUS) with the gradient boosting decision tree (GBDT) to construct the model, which is used along with the current widely used extreme gradient boosting (XGBoost) as heterogeneous classifier to build heterogeneous ensemble in financial distress prediction. In addition, based on the idea of ensemble, this paper uses five feature selection methods based on different theoretical backgrounds to select features, and introduces ensemble from the whole process of feature selection, data preprocessing and model construction. In the comparative experience, the method proposed by us achieves the best performance on the test set. This study demonstrates the broad application of CUS for financial data processing and the superior generalization performance of the ensemble model relative to individual learners.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.