Abstract

Classification algorithms predict the class membership of an unknown record. Methods such as logistic regression or the naïve Bayes algorithm produce a score related to the likelihood that a record belongs to a particular class. A cutoff threshold is then defined to delineate the prediction of one class over another. This paper derives analytic results for the selection of an optimal cutoff threshold for a classification algorithm that is used to inform a two-action decision in the cases of risk aversion and risk neutrality. The results provide insight to how the optimal cutoff thresholds relate to the associated costs and the sensitivity and specificity of the algorithm for both the risk neutral and risk averse decision makers. The optimal risk averse threshold is not reliably above or below the optimal risk neutral threshold, but the relation depends on the parameters of a particular application. The results further show the risk averse optimal threshold is insensitive to the size of the data set or the magnitude of the costs, but instead is sensitive to the proportion of positive records in the data and the ratio of costs. Numeric examples and sensitivity analysis derive further insight. Results show the percent value gap from a misspecified risk attitude increases as the specificity of the classification algorithm decreases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call