Abstract

Background/Objectives: Recent studies emphasized on using ensemble models over single ones to solve credit scoring problems. The objective of this study is to build a heterogeneous ensemble classifier model with an improved classification accuracy. Methods: This study focuses on developing a heterogeneous ensemble classifier using Logistic Regression, K-nearest neighbor, Decision tree, Random Forest, Naïve Base and Support vector machine as base classifiers and Random Forest, Logistic Regression and Support vector machine as meta-classifiers. The proposed model is built using these six base classifiers for ensemble aggregation. A feature selection algorithm based on the random forest technique is used for selecting the best features. A stacking and voting method are used for building ensemble model. Findings: The ensemble classifier gives superior predictive performance than single classifiers SVM, DT, RF, NB, KNN and LR with an accuracy of 91.56% for Australian dataset and 84.35% for German dataset. Novelty: The proposed model uses stacking and majority voting method for ensemble classification. Initially, stacking is applied to the base classifiers. This is done in two levels. First the training dataset is split into 10 folds for cross validation. The output of each classifier is taken, and the dataset is updated with the meta-features. In the second level, three meta-classifiers (MC), namely LR, SVM and RF are used. Majority voting is applied to the output of these meta-classifiers for the prediction. Keywords: Credit scoring; ensemble model; SVM; DT; RF; NB; KNN; LR

Highlights

  • A credit scoring model is an analysis tool used to determine the creditworthiness of a loan applicant based on historical data and by estimating the default probability

  • The models are designed by training single base classifiers and the resulting output is integrated by using an ensemble strategy to enhance the performance

  • Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbor (KNN), Random Forest (RF), Naive Base, Decision Tree (DT) are used as base models

Read more

Summary

Introduction

A credit scoring model is an analysis tool used to determine the creditworthiness of a loan applicant based on historical data and by estimating the default probability. The performance of the credit scoring model is proven to be more effective by using ensemble modeling. The credit scoring model is used to assess the credit risk of a new applicant(2) or to assess the likelihood of a default using information from a previous loan applicant (3). The 2 most commonly and widely used statistical methods in credit scoring are Logistic Regression (LR) and Linear Discriminant Analysis (LDA). Machine learning classification approaches like K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), Naïve Base (NB), Classification and Regression Tree (CART), Genetic Algorithms (GA), and Artificial Neural Networks (ANN) are extensively used in credit scoring

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call