Abstract

Many quantitative credit scoring models have been developed for credit risk assessment. Irrelevant and redundant features may deteriorate the performance of credit risk classification. Feature selection with metaheuristic techniques can be applied to excavate the most significant features. However, metaheuristic techniques suffer from various issues such as being trapped in local optimum and premature convergence. Therefore, in this article, a hybrid variable neighborhood search and estimation of distribution technique with the elitist population strategy is proposed to identify the optimal feature subset. Variable neighborhood search with the elitist population strategy is used to direct its local searching in order to optimize the ergodicity, avoid premature convergence, and jump out of the local optimum in the searching process. The probabilistic model attempts to capture the probability distribution of the promising solutions which are biased towards the global optimum. The proposed technique has been tested on both publicly available credit datasets and a real-world credit dataset in China. Experimental analysis demonstrates that it outperforms existing techniques in large-scale credit datasets with high dimensionality, making it well suited for feature selection in credit risk classification.

Highlights

  • Credit risk assessment is one of the most important issues for serving small- and medium-sized enterprises (SMEs) in the commercial banking industry

  • Feature selection and feature extraction are two common methods for dimensionality reduction [5,6,7]. Both the methods help in simplifying features and can enhance stability and generalization of the classifier to improve learning ability, efficiency, and convenience. e major difference between two methods is that features obtained by feature extraction are not the original feature, while features obtained by feature selection are the part of original feature [8, 9]

  • The categorical features are transformed into the numerical ones and converted into binary string, the missing data can be filled with the median amount, and each original feature is linearly scaled to the range [0, 1]

Read more

Summary

Introduction

Credit risk assessment is one of the most important issues for serving small- and medium-sized enterprises (SMEs) in the commercial banking industry. Data availability in credit loan is significantly enhanced by information technology Multisource data, such as personal basic information, economy behavior, and social activity, are required for credit risk scoring purposes, in order to take preventive measures during the credit monitoring process and prioritize recovery efforts. These large-scale data are commonly high dimensional that leads to the curse of dimensionality. Feature selection and feature extraction are two common methods for dimensionality reduction [5,6,7] Both the methods help in simplifying features and can enhance stability and generalization of the classifier to improve learning ability, efficiency, and convenience. The optimal selection based on subspace searching can remove the redundant feature, and it makes the performance of this method is always better than the optimal selection based on sorting methods [14]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call