Linear Classifiers Under Infinite Imbalance

Paul Glasserman,Mike Li

doi:10.1287/opre.2021.0376

Abstract

Understanding Linear Classifiers in the Face of Severe Data Imbalance In “Linear Classifiers Under Infinite Imbalance,” Paul Glasserman and Mike Li tackle the challenge of binary classification when data are severely imbalanced—a common dilemma in fields like healthcare and finance. They build upon the work of Owen by examining the behavior of logistic regression and extending the analysis to a broader class of linear discriminant functions. Their key contribution is the proof of infinite-imbalance limits for these functions’ coefficient vectors, providing explicit expressions for these limits and distinguishing between classifiers with subexponential and exponential weight functions. This distinction allows for a better understanding of how to adjust classifiers in the context of extreme imbalance, ultimately leading to improved specificity or sensitivity in predictions. The authors also link their findings to the concepts of robustness and conservatism in classification decisions, offering insight into optimal classifier design against the most challenging alternatives. The practical implications of their theoretical work are illustrated through numerical examples and a credit risk case study, offering a new perspective on managing classification tasks in the face of infinite imbalance.

Full Text