Abstract

Credit scoring models are the cornerstone of the modern financial industry. After years of development, artificial intelligence and machine learning have led to the transformation of traditional credit scoring models based on statistics. In this study, a novel multi-stage ensemble model with a hybrid genetic algorithm is proposed to achieve accurate and stable credit prediction. To alleviate the adverse effects of imbalanced data in credit scoring models, the Instance Hardness Threshold method is extended using a majority voting strategy to deal with data imbalance. To eliminate redundant and irrelevant features in the dataset and select well-performing base classifiers, a new hybrid genetic algorithm is proposed to obtain the optimal feature subset and base classifier subset. To aggregate the predictive power of the base classifiers, a stacking approach is adopted to integrate the optimal base classifiers into the ensemble model. The proposed model is tested on three standard imbalanced credit scoring datasets, compared with similar state-of-the-art approaches, and evaluated using four well-known evaluation indicators. The experimental results prove the effectiveness of the proposed model and demonstrate its superiority.

Highlights

  • The ability to accurately assess the creditworthiness of customers who apply for loans and perform corresponding risk management is the key to the development of the modern financial industry

  • CLASSIFIER ENSEMBLE The ensemble model has been proven to be an effective approach for improving the performance of the credit scoring model (Wang et al, 2011)

  • In the classifier selection procedure, an individual in the HYBRID GENETIC ALGORITHM (HGA) represents a candidate base classifier subset, a population in the generation consists of multiple individuals, and the optimal individual represents the optimal base classifier subset that is obtained through genetic evolution

Read more

Summary

INTRODUCTION

The ability to accurately assess the creditworthiness of customers who apply for loans and perform corresponding risk management is the key to the development of the modern financial industry. In the imbalanced credit scoring data, positive samples refer to the number of defaulting customers, and negative samples refer to the number of non-defaulting customers. The rationale behind this phenomenon is that, in most real-world cases, the number of customers who pay their bills on time is much larger than the number of customers who default Both statistics-based and machine learning-based credit scoring models find making accurate predictions challenging when imbalanced data are directly input. Enhancing the predictive ability of credit scoring models using imbalanced data is the first motivation of this study. Developing an effective feature selection approach is a prerequisite to lower data processing costs, a better understanding of data, and better-performing credit scoring models. Multiple poorly-performing or correlated base classifiers in an ensemble model may result in adverse ensemble effects.

RELATED WORK
FEATURE SELECTION
EXPERIMENTAL DESIGN
EXPERIMENTAL SETTING The raw dataset was divided as follows
EXPERIMENTAL ANALYSIS
Method
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call