Abstract
Credit scoring models are the cornerstone of the modern financial industry. After years of development, artificial intelligence and machine learning have led to the transformation of traditional credit scoring models based on statistics. In this study, a novel multi-stage ensemble model with a hybrid genetic algorithm is proposed to achieve accurate and stable credit prediction. To alleviate the adverse effects of imbalanced data in credit scoring models, the Instance Hardness Threshold method is extended using a majority voting strategy to deal with data imbalance. To eliminate redundant and irrelevant features in the dataset and select well-performing base classifiers, a new hybrid genetic algorithm is proposed to obtain the optimal feature subset and base classifier subset. To aggregate the predictive power of the base classifiers, a stacking approach is adopted to integrate the optimal base classifiers into the ensemble model. The proposed model is tested on three standard imbalanced credit scoring datasets, compared with similar state-of-the-art approaches, and evaluated using four well-known evaluation indicators. The experimental results prove the effectiveness of the proposed model and demonstrate its superiority.
Highlights
The ability to accurately assess the creditworthiness of customers who apply for loans and perform corresponding risk management is the key to the development of the modern financial industry
CLASSIFIER ENSEMBLE The ensemble model has been proven to be an effective approach for improving the performance of the credit scoring model (Wang et al, 2011)
In the classifier selection procedure, an individual in the HYBRID GENETIC ALGORITHM (HGA) represents a candidate base classifier subset, a population in the generation consists of multiple individuals, and the optimal individual represents the optimal base classifier subset that is obtained through genetic evolution
Summary
The ability to accurately assess the creditworthiness of customers who apply for loans and perform corresponding risk management is the key to the development of the modern financial industry. In the imbalanced credit scoring data, positive samples refer to the number of defaulting customers, and negative samples refer to the number of non-defaulting customers. The rationale behind this phenomenon is that, in most real-world cases, the number of customers who pay their bills on time is much larger than the number of customers who default Both statistics-based and machine learning-based credit scoring models find making accurate predictions challenging when imbalanced data are directly input. Enhancing the predictive ability of credit scoring models using imbalanced data is the first motivation of this study. Developing an effective feature selection approach is a prerequisite to lower data processing costs, a better understanding of data, and better-performing credit scoring models. Multiple poorly-performing or correlated base classifiers in an ensemble model may result in adverse ensemble effects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.