Abstract

Data science and machine-learning techniques help banks to optimize enterprise operations, enhance risk analyses and gain competitive advantage. There is a vast amount of research in credit risk, but to our knowledge, none of them uses credit registry as a data source to model the probability of default for individual clients. The goal of this paper is to evaluate different machine-learning models to create accurate model for credit risk assessment using the data from the real credit registry dataset of the Central Bank of Republic of North Macedonia. We strongly believe that the model developed in this research will be an additional source of valuable information to commercial banks, by leveraging historical data for all the population of the country in all the commercial banks. Thus, in this research, we compare five machine-learning models to classify credit risk data, i.e., logistic regression, decision tree, random forest, support vector machines (SVM) and neural network. We evaluate the five models using different machine-learning metrics, and we propose a model based on credit registry data from the central bank with detailed methodology that can predict the credit risk based on credit history of the population in the country. Our results show that the best accuracy is achieved by using decision tree performing on imbalanced data with and without scaling, followed by random forest and linear regression.

Highlights

  • The results show the perspective of central banks when doing credit risk analysis, which differentiates by far from the traditional credit risk analyses of commercial banks, which leverage more detailed data per client but lack the information for the same client in other banks

  • Their work compares performance indicators of the prediction methods before and after data balancing. Their results show that implementation of sampling strategies (such as the synthetic minority oversampling technique (SMOTE)) improves the performance of prediction models comparing with unbalanced data

  • The results showed that the models performed best with high accuracy with high accuracy using imbalanced data without balancing, followed by using a balanced training set with SMOTE without scaling

Read more

Summary

Introduction

Their digital transformation of business processes is inevitable by introducing solutions for big data toward enhancing their operations. The adoption of technologies and infrastructure for big data sets presents a great opportunity to enhance the operations and to increase the revenue of banks and enterprises in general by discovering new knowledge from their existing datasets (Fang and Zhang 2016; Yin and Kaynak 2015). With the rise of big data as an emerging field, data science took its role as a modern and important scientific approach which provides the ability to gain new insights and knowledge from big data and offer a key competitive advantage to businesses. Data science’s primary role is to support banks and businesses in the process of decision making and to drive insights and future predictions, which will help them to operate more efficiently compared to its competitors. Predictive analytics is the technology set that combines data science, machine learning and predictive and statistical modeling to generate predictions for different expert systems, such as predicting risk, liquidity, customer churn, fraud detection and revenue and for making informed decisions (Lackovic et al 2016; Provost and Fawcett 2013)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call