Abstract

Financial decision-making, particularly in loan approval, requires precise risk prediction. To enhance the prediction accuracy, this study utilizes various machine learning models, namely Logistic Regression, XGBoost, an Artificial Neural Network (ANN), and a hybrid XGBoost + Logistic Regression (XGB+LR). These models were selected based on their unique capacities to capture complex patterns and relationships within the data, thereby potentially improving the loan default prediction task. The training and validation of these models were performed on a meticulously prepared dataset, following crucial preprocessing steps such as one-hot encoding, feature selection, and scaling. To ensure the models' optimal performance, intensive hyperparameter tuning was conducted. The application of these techniques resulted in a robust set of models. Each model's performance was rigorously evaluated through established metrics, including the Area Under the ROC Curve (AUC) and Accuracy (ACC). Among these models, the XGBoost model demonstrated superior predictive power, achieving an AUC of 0.798 and an ACC of 0.861 on the validation set. A detailed feature importance analysis using the XGBoost model further revealed that Credit_Score and Loan_Amount were the primary factors impacting loan approval decisions. Despite slight overfitting observed in the models, the results confirm the potential of machine learning in improving financial decision-making processes. This study sets the foundation for future advancements, which may include the application of advanced regularization techniques, further hyperparameter optimization, and the inclusion of a broader feature set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call