A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data

Yilun Jin,Xin Wu,Yanan Liu,Wenyu Zhang,Zeqian Hu

doi:10.1109/access.2021.3120086

Abstract

Credit scoring models are the cornerstone of the modern financial industry. After years of development, artificial intelligence and machine learning have led to the transformation of traditional credit scoring models based on statistics. In this study, a novel multi-stage ensemble model with a hybrid genetic algorithm is proposed to achieve accurate and stable credit prediction. To alleviate the adverse effects of imbalanced data in credit scoring models, the Instance Hardness Threshold method is extended using a majority voting strategy to deal with data imbalance. To eliminate redundant and irrelevant features in the dataset and select well-performing base classifiers, a new hybrid genetic algorithm is proposed to obtain the optimal feature subset and base classifier subset. To aggregate the predictive power of the base classifiers, a stacking approach is adopted to integrate the optimal base classifiers into the ensemble model. The proposed model is tested on three standard imbalanced credit scoring datasets, compared with similar state-of-the-art approaches, and evaluated using four well-known evaluation indicators. The experimental results prove the effectiveness of the proposed model and demonstrate its superiority.

Highlights

The ability to accurately assess the creditworthiness of customers who apply for loans and perform corresponding risk management is the key to the development of the modern financial industry
CLASSIFIER ENSEMBLE The ensemble model has been proven to be an effective approach for improving the performance of the credit scoring model (Wang et al, 2011)
In the classifier selection procedure, an individual in the HYBRID GENETIC ALGORITHM (HGA) represents a candidate base classifier subset, a population in the generation consists of multiple individuals, and the optimal individual represents the optimal base classifier subset that is obtained through genetic evolution

Summary

INTRODUCTION

The ability to accurately assess the creditworthiness of customers who apply for loans and perform corresponding risk management is the key to the development of the modern financial industry. In the imbalanced credit scoring data, positive samples refer to the number of defaulting customers, and negative samples refer to the number of non-defaulting customers. The rationale behind this phenomenon is that, in most real-world cases, the number of customers who pay their bills on time is much larger than the number of customers who default Both statistics-based and machine learning-based credit scoring models find making accurate predictions challenging when imbalanced data are directly input. Enhancing the predictive ability of credit scoring models using imbalanced data is the first motivation of this study. Developing an effective feature selection approach is a prerequisite to lower data processing costs, a better understanding of data, and better-performing credit scoring models. Multiple poorly-performing or correlated base classifiers in an ensemble model may result in adverse ensemble effects.

RELATED WORK

FEATURE SELECTION

EXPERIMENTAL DESIGN

EXPERIMENTAL SETTING The raw dataset was divided as follows

EXPERIMENTAL ANALYSIS

Method

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Bibliography
-
-
--
23 Dec 2016
23 Dec 2016

Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets
Štefan Lyócsa ... Marko Dávid Vateha
Financial Innovation | VOL. 8
Štefan Lyócsa, et. al.Štefan Lyócsa ... Marko Dávid Vateha
12 Apr 2022
Financial Innovation | VOL. 8

A novel multi-stage ensemble model with multiple K-means-based selective undersampling: An application in credit scoring
Yilun Jin ... Yanan Liu
Journal of Intelligent & Fuzzy Systems | VOL. 40
Yilun Jin, et. al.Yilun Jin ... Yanan Liu
01 Jan 2020
Journal of Intelligent & Fuzzy Systems | VOL. 40

A novel multi-stage ensemble model for credit scoring based on synthetic sampling and feature transformation
Fang He ... Zhijia Yan
Journal of Intelligent & Fuzzy Systems | VOL. 42
Fang He, et. al.Fang He ... Zhijia Yan
02 Feb 2022
Journal of Intelligent & Fuzzy Systems | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access