Abstract

In the age of big data, machine learning models are globally used to execute default risk prediction. Imbalanced datasets and redundant features are two main problems that can reduce the performance of machine learning models. To address these issues, this study conducts an analysis from the viewpoint of different balance ratios as well as the selection order of feature selection. Accordingly, we first use data rebalancing and feature selection to obtain 32 derived datasets with varying ratios of balance and feature combinations for each dataset. Second, we propose a comprehensive metric model based on multimachine learning algorithms (CMM-MLA) to select the best-derived dataset with the optimal balance ratio and feature combination. Finally, the convolutional neural network (CNN) is trained on the selected derived dataset to evaluate the performance of our approach in terms of type-II error, accuracy, G-mean, and AUC. There are two contributions in this study. First, the optimal balance ratio is found through the classification accuracy, which changes the deficiency of the existing research that samples are imbalanced or the balance ratio is 1 : 1 and ensures the accuracy of the classification model. Second, a comprehensive metric model based on the machine learning algorithm is proposed, which can simultaneously find the best balance ratio and the optimal feature selection. The experimental results show that our method can noticeably improve the performance of CNN, and CNN outperforms the other four commonly used machine learning models in the task of default risk prediction on four benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call