Optimizing hyper-parameters of neural networks with swarm intelligence: A novel framework for credit scoring.

Runchi Zhang,Zhiyi Qiu,Jie Zhang

doi:10.1371/journal.pone.0234254

Abstract

Neural networks are widely used in automatic credit scoring systems with high accuracy and outstanding efficiency. However, in the absence of prior knowledge, it is difficult to determine the set of hyper-parameters, which makes its application limited in practice. This paper presents a novel framework of credit-scoring model based on neural networks trained by the optimal swarm intelligence (SI) algorithm. This framework incorporates three procedures. Step 1, pre-processing, including imputation, normalization, and re-ordering of the samples. Step 2, training, where SI algorithms optimize hyper-parameters of back-propagation artificial neural networks (BP-ANN) with the area under curve (AUC) as the evaluation function. Step 3, test, applying the optimized model in Step 2 to predict new samples. The results show that the framework proposed in this paper searches the hyper-parameter space efficiently and finds the optimal set of hyper parameters with appropriate time complexity, which enhances the fitting and generalization ability of BP-ANN. Compared with existing credit-scoring models, the model in this paper predicts with a higher accuracy. Additionally, the model enjoys a greater robustness, for the difference of performance between training and testing phases.

Highlights

Credit scoring refers to the process using statistics to classify applicants for credit into different risk categories [1], in order to “determine the likelihood that a prospective borrower will default on a loan” [2]
This paper carries out an experiment to test whether the back-propagation artificial neural networks (BP-Artificial neural network (ANN)) model trained by swarm intelligence algorithm outperforms prevalent classical models and several typical hybrid or ensemble models constructed in recent literature [29,30,31,32,33,34] within the context of credit scoring
We present the performance of our model while hidden layers of the Back propagation (BP)-ANN increasing, followed by analysis of computational complexity

Summary

Introduction

Credit scoring refers to the process using statistics to classify applicants for credit into different risk categories [1], in order to “determine the likelihood that a prospective borrower will default on a loan” [2]. A variety of statistical models are applied in the process. The simple parametric statistical model, linear discriminate analysis (LDA) is one of the first models for credit scoring, it is questioned because of the presumed normal distribution of data [6]. This deficiency of LDA is largely overcome by some sophisticated models like logistic regression, k

Objectives

Methods

Results

Conclusion