A Study on Credit Scoring Models with different Feature Selection and Machine Learning Approaches

Rahul Pal,Shrawan Trivedi,Shrikanth Kapali

doi:10.2139/ssrn.3743552

Abstract

The present computational work focuses on the credit scoring. An improvement in the credit scoring models has been shown with the use of different feature selection methods and machine learning classifiers. In this paper, a comparative analysis has been performed between different machine learning classifiers such as Bayesian, Naive Bayes, SVM (support Vector Machine), Decision Tree, Random Forest and the Feature selection techniques used for the analysis are Chi-Square, Information-gain and Gain-Ratio. Different metrics have been considered for analyzing the performance of models (such as False Positive rate, F-Measure, and Training time). After the analysis the best classifier and the feature selection algorithms have been found. In this study, the combination of Random Forest and Information Gain is found to be best among all other in respect to good performance accuracy and low false positive rate. However, training time of this combination was more. The result of SVM was comparable with the Random forest.

Full Text