Measurement of health-related social needs (HRSNs) is complex. We sought to develop and validate computable phenotypes (CPs) using structured electronic health record (EHR) data for food insecurity, housing instability, financial insecurity, transportation barriers, and a composite-type measure of these, using human-defined rule-based and machine learning (ML) classifier approaches. We collected HRSN surveys as the reference standard and obtained EHR data from 1550 patients in 3 health systems from 2 states. We followed a Delphi-like approach to develop the human-defined rule-based CP. For the ML classifier approach, we trained supervised ML (XGBoost) models using 78 features. Using surveys as the reference standard, we calculated sensitivity, specificity, positive predictive values, and area under the curve (AUC). We compared AUCs using the Delong test and other performance measures using McNemar's test, and checked for differential performance. Most patients (63%) reported at least one HRSN on the reference standard survey. Human-defined rule-based CPs exhibited poor performance (AUCs=.52 to .68). ML classifier CPs performed significantly better, but still poor-to-fair (AUCs = .68 to .75). Significant differences for race/ethnicity were found for ML classifier CPs (higher AUCs for White non-Hispanic patients). Important features included number of encounters and Medicaid insurance. Using a supervised ML classifier approach, HRSN CPs approached thresholds of fair performance, but exhibited differential performance by race/ethnicity. CPs may help to identify patients who may benefit from additional social needs screening. Future work should explore the use of area-level features via geospatial data and natural language processing to improve model performance.
Read full abstract