Dissecting Racial Bias in an Algorithm that Guides Health Decisions for 70 Million People

Ziad Obermeyer,Sendhil Mullainathan

doi:10.1145/3287560.3287593

Abstract

A single algorithm drives an important health care decision for over 70 million people in the US. When health systems anticipate that a patient will have especially complex and intensive future health care needs, she is enrolled in a 'care management' program, which provides considerable additional resources: greater attention from trained providers and help with coordination of her care. To determine which patients will have complex future health care needs, and thus benefit from program enrollment, many systems rely on an algorithmically generated commercial risk score. In this paper, we exploit a rich dataset to study racial bias in a commercial algorithm that is deployed nationwide today in many of the US's most prominent Accountable Care Organizations (ACOs). We document significant racial bias in this widely used algorithm, using data on primary care patients at a large hospital. Blacks and whites with the same algorithmic risk scores have very different realized health. For example, the highest-risk black patients (those at the threshold where patients are auto-enrolled in the program), have significantly more chronic illnesses than white enrollees with the same risk score. We use detailed physiological data to show the pervasiveness of the bias: across a range of biomarkers, from HbA1c levels for diabetics to blood pressure control for hypertensives, we find significant racial health gaps conditional on risk score. This bias has significant material consequences for patients: it effectively means that white patients with the same health as black patients are far more likely be enrolled in the care management program, and benefit from its resources. If we simulated a world without this gap in predictions, blacks would be auto-enrolled into the program at more than double the current rate. An unusual aspect of our dataset is that we observe not just the risk scores but also the input data and objective function used to construct it. This provides a unique window into the mechanisms by which bias arises. The algorithm is given a data frame with (1) Yit (label), total medical expenditures ('costs') in year t; and (2) Xi,t--1 (features), fine-grained care utilization data in year t -- 1 (e.g., visits to cardiologists, number of x-rays, etc.). The algorithm's predicted risk of developing complex health needs is thus in fact predicted costs. And by this metric, one could easily call the algorithm unbiased: costs are very similar for black and white patients with the same risk scores. So far, this is inconsistent with algorithmic bias: conditional on risk score, predictions do not favor whites or blacks. The fundamental problem we uncover is that when thinking about 'health care needs,' hospitals and insurers focus on costs. They use an algorithm whose specific objective is cost prediction, and from this perspective, predictions are accurate and unbiased. Yet from the social perspective, actual health -- not just costs -- also matters. This is where the problem arises: costs are not the same as health. While costs are a reasonable proxy for health (the sick do cost more, on average), they are an imperfect one: factors other than health can drive cost -- for example, race. We find that blacks cost more than whites on average; but this gap can be decomposed into two countervailing effects. First, blacks bear a different and larger burden of disease, making them costlier. But this difference in illness is offset by a second factor: blacks cost less, holding constant their exact chronic conditions, a force that dramatically reduces the overall cost gap. Perversely, the fact that blacks cost less than whites conditional on health means an algorithm that predicts costs accurately across racial groups will necessarily also generate biased predictions on health. The root cause of this bias is not in the procedure for prediction, or the underlying data, but the algorithm's objective function itself. This bias is akin to, but distinct from, 'mis-measured labels': it arises here from the choice of labels, not their measurement, which is in turn a consequence of the differing objective functions of private actors in the health sector and society. From the private perspective, the variable they focus on -- cost -- is being appropriately optimized. But our results hint at how algorithms may amplify a fundamental problem in health care as a whole: externalities produced when health care providers focus too narrowly on financial motives, optimizing on costs to the detriment of health. In this sense, our results suggest that a pervasive problem in health care -- incentives that induce health systems to focus on dollars rather than health -- also has consequences for the way algorithms are built and monitored.

Full Text