PurposeTo develop and internally validate prediction models with machine learning for future potentially preventable healthcare utilization in patients with multiple long term conditions (MLTC). This study is the first step in investigating whether prediction models can help identify patients with MLTC that are most in need of integrated care.MethodsA retrospective cohort study was performed with electronic health record data from adults with MLTC from an academic medical center in the Netherlands. Based on demographic and healthcare utilization characteristics in 2017, we predicted ≥ 12 outpatient visits, ≥ 1 emergency department (ED) visits, and ≥ 1 acute hospitalizations in 2018. Four machine learning models (elastic net regression, extreme gradient boosting (XGB), logistic regression, and random forest) were developed, optimized, and evaluated in a hold-out dataset for each outcome.ResultsA total of 14,486 patients with MLTC were included. Based on the area under the curve (AUC) and calibration curves, the XGB model was selected as final model for all three outcomes. The AUC was 0.82 for ≥ 12 outpatient visits, 0.76 for ≥ 1 ED visits and 0.73 for ≥ 1 acute hospitalizations. Despite adequate AUC and calibration, precision-recall curves showed suboptimal performance.ConclusionsThe final selected models per outcome can identify patients with future potentially preventable high healthcare utilization. However, identifying high-risk patients with MLTC and substantiating if they are most in need of integrated care remains challenging. Further research is warranted investigating whether patients with high healthcare utilization are indeed the most in need of integrated care and whether quantitively identified patients match the identification based on clinicians’ experience and judgment.
Read full abstract