Abstract

The financial credibility of a person is a factor used to determine whether a loan should be approved or not, and this is quantified by a ‘credit score,’ which is calculated using a variety of factors, including past performance on debt obligations, profiling, amongst others. Machine learning has been widely applied to automate the development of effective credit scoring models over the years. Yet, studies show that the development of robust credit scoring models may take longer than a year, and thus, if the behavior of customers changes over time, the model will be outdated even before its deployment. In this paper, we made 3 anonymized real-world credit scoring datasets available alongside the results obtained. In each of these datasets, we verify whether the credit scoring task should be thought as an ephemeral scenario since many of the variables may drift over time, and thus, data stream mining techniques should be used since they were tailored for incremental learning and to detect and adapt to changes in the data distribution. Therefore, we compare both traditional batch machine learning algorithms with data stream algorithms in different validation schemes using both Kolmogorov–Smirnov and Population Stability Index metrics. Furthermore, we also provide insights on the importance of features according to their Information Value, Mean Decrease Impurity, and Mean Positional Gain metrics, such that the last depicts changes in the importance of features over time. For 2 of the 3 tested datasets, the results obtained by data stream learners are comparable to predictive models currently in use, thus showing the efficiency of data stream classification for the credit scoring task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call