Abstract

An accurate prediction of loan default is crucial in credit risk evaluation. A slight deviation from true accuracy can often cause financial losses to lending institutes. This study describes the non-parametric approach that compares five different machine learning classifiers combined with a focus on sufficiently large datasets. It presents the findings on various standard performance measures such as accuracy, precision, recall and F1 scores in addition to Receiver Operating Curve-Area Under Curve (ROC-AUC). In this study, various data pre-processing techniques including normalization and standardization, imputation of missing values and the handling of imbalanced data using SMOTE will be discussed and implemented. Also, the study examines the use of hyper-parameters in various classifiers. During the model construction phase, various pipelines feed data to the five machine learning classifiers, and the performance results obtained from the five machine learning classifiers are based on sampling with SMOTE or hyper-parameters versus without SMOTE and hyper-parameters. Each classifier is compared to another in terms of accuracy during training and prediction phase based on out-of-sample data. The 2 data sets used for this experiment contain 1000 and 30,000 observations, respectively, of which the training/testing ratio is 80:20. The comparative results show that random forest outperforms the other four classifiers both in training and actual prediction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.