Machine Learning-Based Prediction for High Health Care Utilizers by Using a Multi-Institutional Diabetes Registry: Model Training and Evaluation.

Joshua Kuan Tan,Le Quan,Nur Nasyitah Mohamed Salim,Jen Hong Tan,Su-Yen Goh,Julian Thumboo,Yong Mong Bee

doi:10.2196/58463

Abstract

The cost of health care in many countries is increasing rapidly. There is a growing interest in using machine learning for predicting high health care utilizers for population health initiatives. Previous studies have focused on individuals who contribute to the highest financial burden. However, this group is small and represents a limited opportunity for long-term cost reduction. We developed a collection of models that predict future health care utilization at various thresholds. We utilized data from a multi-institutional diabetes database from the year 2019 to develop binary classification models. These models predict health care utilization in the subsequent year across 6 different outcomes: patients having a length of stay of ≥7, ≥14, and ≥30 days and emergency department attendance of ≥3, ≥5, and ≥10 visits. To address class imbalance, random and synthetic minority oversampling techniques were employed. The models were then applied to unseen data from 2020 and 2021 to predict health care utilization in the following year. A portfolio of performance metrics, with priority on area under the receiver operating characteristic curve, sensitivity, and positive predictive value, was used for comparison. Explainability analyses were conducted on the best performing models. When trained with random oversampling, 4 models, that is, logistic regression, multivariate adaptive regression splines, boosted trees, and multilayer perceptron consistently achieved high area under the receiver operating characteristic curve (>0.80) and sensitivity (>0.60) across training-validation and test data sets. Correcting for class imbalance proved critical for model performance. Important predictors for all outcomes included age, number of emergency department visits in the present year, chronic kidney disease stage, inpatient bed days in the present year, and mean hemoglobin A1c levels. Explainability analyses using partial dependence plots demonstrated that for the best performing models, the learned patterns were consistent with real-world knowledge, thereby supporting the validity of the models. We successfully developed machine learning models capable of predicting high service level utilization with strong performance and valid explainability. These models can be integrated into wider diabetes-related population health initiatives.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine Learning-Based Prediction for High Health Care Utilizers by Using a Multi-Institutional Diabetes Registry: Model Training and Evaluation.

Abstract

Talk to us

Similar Papers

More From: JMIR AI

Lead the way for us

Similar Papers

Machine-learning based prediction for high health care utilizers using a multi-institution registry
J K Tan ... Y M Bee
European Journal of Public Health | VOL. 34
J K Tan, et. al.J K Tan ... Y M Bee
28 Oct 2024
European Journal of Public Health | VOL. 34

Impact of Fatty Liver Disease on Health Care Utilization and Costs in a General Population: A 5-Year Observation
Sebastian E Baumeister ... Dietrich Alte
Gastroenterology | VOL. 134
Sebastian E Baumeister, et. al.Sebastian E Baumeister ... Dietrich Alte
18 Oct 2007
Gastroenterology | VOL. 134

Incremental health care utilization and expenditures for sleep disorders in the United States.
Phillip Huyett ... Neil Bhattacharyya
Journal of Clinical Sleep Medicine | VOL. 17
Phillip Huyett, et. al.Phillip Huyett ... Neil Bhattacharyya
04 May 2021
Journal of Clinical Sleep Medicine | VOL. 17

A System-Wide Population Health Value Approach to Reduce Hospitalization Among Chronic Kidney Disease Patients: an Observational Study.
R Gupta ... L Chen
Journal of General Internal Medicine | VOL. 36
R Gupta, et. al.R Gupta ... L Chen
02 Nov 2020
Journal of General Internal Medicine | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning-Based Prediction for High Health Care Utilizers by Using a Multi-Institutional Diabetes Registry: Model Training and Evaluation.

Abstract

Talk to us

Similar Papers

More From: JMIR AI