Research ObjectiveIncreasingly leveraged in health care, artificial intelligence and machine learning techniques are drawing concerns about biases and racial disparities. Proprietary “black box” predictive models offer limited understanding of algorithmic bias and potential impact for patients. From 2016 to 2018, safety‐net health system NYC Health + Hospitals (H+H) developed a payer‐agnostic algorithm predicting future acute care utilization from past utilization, social determinants, demographics, insurance, and number of chronic conditions. High‐risk patients are eligible for augmented care management. We sought to assess potential racial biases in our predictive model of acute care utilization.Study DesignWe examined bias in predicted vs. actual utilization among our model validation cohort. Stratifying by non‐Hispanic White (White), non‐Hispanic Black (Black), Hispanic, and Other race/ethnicity categories, we compared (a) distribution of high‐risk status; (b) model sensitivity and positive predictive value (PPV); and (c) phenotype of “false negative” patients who were not flagged as high risk but became high utilizers. We conducted sensitivity analyses to explore the impact of prediction threshold on bias for top 1% (extreme high risk) vs top 5% (implemented cutoff).Population StudiedValidation cohort patients (N = 250 191) were randomly selected and represented 30% of adults who visited H+H from July 2017 to June 2018. Incarcerated or pregnant patients were excluded.Principal FindingsPredicted distribution of race/ethnicity among high‐risk patients was 44.4% Black, 22.9% Hispanic, 19.4% Other, and 13.3% White. Actual distribution among high utilizing patients was 40.5% Black, 31.1% Hispanic, 19.4% Other, and 9% White. Black and White patients were overrepresented among predicted high‐risk patients, and Hispanic patients were underrepresented.PPVs across race/ethnicity were comparable (<5% absolute difference) but sensitivities varied greatly. The model was best at capturing risk among White patients (sensitivity: 35.5%), followed by Black patients (29.0%), and Other patients (25.0%). Hispanic patients had the lowest sensitivity (18.2%).In phenotyping our false negatives, the model more frequently failed to identify female patients, particularly Hispanic, Other, or Black females; and uninsured patients, particularly Hispanic uninsured patients, suggesting possible interactions with race/ethnicity, gender, and insurance status. Sensitivity among Hispanic patients varied greatly by insurance status, from 27.5% (Medicaid) to 8.9% (uninsured). Differences by race/ethnicity were attenuated using the stricter top 1% definition of high risk in our sensitivity analyses.ConclusionsUnlike cost‐oriented algorithms like that evaluated by Obermeyer et al, we did not find bias against Black patients in H + H’s safety net risk prediction model. Our model did under‐predict risk for Hispanic patients. Understanding the sources of this difference will require further examination. Preliminary studies suggest Hispanic patients may use the system differently, engaging frequently in primary care but also under‐utilizing health care overall due to insurance status. We are now evaluating the value of applying different risk thresholds by subpopulation to mitigate the effects of algorithmic bias for our patients.Implications for Policy or PracticeIn December 2019 following Obermeyer et al’s article on model bias in Science, Sen. Ron Wyden and Sen. Cory Booker urged federal agencies and insurance companies to examine the prevalence and impact of bias in health care algorithms. This study provides an additional framework for those evaluations.
Read full abstract