Naive Bayes using the expectation-maximization algorithm for reject inference

Billie Anderson

doi:10.1080/23737484.2022.2106325

Abstract

In the last several years, there has been significant research in applying semi-supervised machine learning models to the reject inference problem. When a financial institution wants to build a model to predict the default of credit applicants, the institution only has a known good/bad outcome loan status for the accepted applicants; this causes an inherent bias in the model. Reject inference is used to infer the good or bad loan status of credit applicants that were rejected by a financial institution. This paper presents a reject inference technique in which a semi-supervised framework is developed using a Naive Bayes model. The framework uses the expectation-maximization (EM) algorithm to incorporate rejected applicants into the parameter estimation of the model using a bootstrapping approach. The proposed method has an advantage over traditional reject inference methods because the rejected applicant data will participate in the estimation of the model parameters, thus avoiding the extrapolation problem. The Naive Bayes model using the EM algorithm is compared to logistic regression and several semi-supervised techniques.

Full Text