Incorporating machine learning and social determinants of health indicators into prospective risk adjustment for health plan payments

Jeremy A Irvin,Behzad Haghgoo,Andrew Y Ng,Andrew A Kondrich,Michael Ko,Sanjay Basu,Bruce E Landon,Pranav Rajpurkar,Robert L Phillips,Stephen Petterson

doi:10.1186/s12889-020-08735-0

Jeremy A Irvin, Behzad Haghgoo + Show 8 more

Open Access

PDF Available

https://doi.org/10.1186/s12889-020-08735-0

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundRisk adjustment models are employed to prevent adverse selection, anticipate budgetary reserve needs, and offer care management services to high-risk individuals. We aimed to address two unknowns about risk adjustment: whether machine learning (ML) and inclusion of social determinants of health (SDH) indicators improve prospective risk adjustment for health plan payments.MethodsWe employed a 2-by-2 factorial design comparing: (i) linear regression versus ML (gradient boosting) and (ii) demographics and diagnostic codes alone, versus additional ZIP code-level SDH indicators. Healthcare claims from privately-insured US adults (2016–2017), and Census data were used for analysis. Data from 1.02 million adults were used for derivation, and data from 0.26 million to assess performance. Model performance was measured using coefficient of determination (R2), discrimination (C-statistic), and mean absolute error (MAE) for the overall population, and predictive ratio and net compensation for vulnerable subgroups. We provide 95% confidence intervals (CI) around each performance measure.ResultsLinear regression without SDH indicators achieved moderate determination (R2 0.327, 95% CI: 0.300, 0.353), error ($6992; 95% CI: $6889, $7094), and discrimination (C-statistic 0.703; 95% CI: 0.701, 0.705). ML without SDH indicators improved all metrics (R2 0.388; 95% CI: 0.357, 0.420; error $6637; 95% CI: $6539, $6735; C-statistic 0.717; 95% CI: 0.715, 0.718), reducing misestimation of cost by $3.5 M per 10,000 members. Among people living in areas with high poverty, high wealth inequality, or high prevalence of uninsured, SDH indicators reduced underestimation of cost, improving the predictive ratio by 3% (~$200/person/year).ConclusionsML improved risk adjustment models and the incorporation of SDH indicators reduced underpayment in several vulnerable populations.

Highlights

Risk adjustment models are employed to prevent adverse selection, anticipate budgetary reserve needs, and offer care management services to high-risk individuals
While traditional risk adjustment models are limited in modeling complexity and tend to underpredict expenditures of populations with very high expenditures [6, 7], machine learning methods may help to capture complex non-linear relationships and interaction terms among variables, which could explain why some individuals with complex constellations of risk factors and diagnoses experience substantially higher cost than predicted
The objective of this study was to assess whether prospective risk adjustment models may be improved by machine learning methods and by the incorporation of area-level social determinants of health (SDH) indicators in a national privately-insured adult population

Summary

Introduction

Risk adjustment models are employed to prevent adverse selection, anticipate budgetary reserve needs, and offer care management services to high-risk individuals. While traditional risk adjustment models are limited in modeling complexity and tend to underpredict expenditures of populations with very high expenditures [6, 7], machine learning methods may help to capture complex non-linear relationships and interaction terms among variables, which could explain why some individuals with complex constellations of risk factors and diagnoses experience substantially higher cost than predicted. Among people with low income and diabetes receiving insulin, food insecurity is associated with hypoglycemia and emergency room visits during the last week of each month (after income from a first-of-the-month paycheck is deprived) and hypoglycemic medications are still being taken [8] These complex relationships are hard to model in standard risk equations, but can be potentially better captured by interactions-focused, nonlinear machine learning algorithms. This is partially because the machine learning models developed to date have not yet demonstrated superior predictive performance over traditional linear models on large datasets with more than a million enrollees [2]

Objectives

Methods

Results