Illness severity assessment of older adults in critical illness using machine learning (ELDER-ICU): an international multicentre study with subgroup bias evaluation

Xiaoli Liu,Pan Hu,Wesley Yeung,Zhongheng Zhang,Vanda Ho,Chao Liu,Clark Dumontier,Patrick J Thoral,Zhi Mao,Desen Cao,Roger G Mark,Zhengbo Zhang,Mengling Feng,Deyu Li,Leo Anthony Celi

doi:10.1016/s2589-7500(23)00128-0

Abstract

Comorbidity, frailty, and decreased cognitive function lead to a higher risk of death in elderly patients (more than 65 years of age) during acute medical events. Early and accurate illness severity assessment can support appropriate decision making for clinicians caring for these patients. We aimed to develop ELDER-ICU, a machine learning model to assess the illness severity of older adults admitted to the intensive care unit (ICU) with cohort-specific calibration and evaluation for potential model bias. In this retrospective, international multicentre study, the ELDER-ICU model was developed using data from 14 US hospitals, and validated in 171 hospitals from the USA and Netherlands. Data were extracted from the Medical Information Mart for Intensive Care database, electronic ICU Collaborative Research Database, and Amsterdam University Medical Centers Database. We used six categories of data as predictors, including demographics and comorbidities, physical frailty, laboratory tests, vital signs, treatments, and urine output. Patient data from the first day of ICU stay were used to predict in-hospital mortality. We used the eXtreme Gradient Boosting algorithm (XGBoost) to develop models and the SHapley Additive exPlanations method to explain model prediction. The trained model was calibrated before internal, external, and temporal validation. The final XGBoost model was compared against three other machine learning algorithms and five clinical scores. We performed subgroup analysis based on age, sex, and race. We assessed the discrimination and calibration of models using the area under receiver operating characteristic (AUROC) and standardised mortality ratio (SMR) with 95% CIs. Using the development dataset (n=50 366) and predictive model building process, the XGBoost algorithm performed the best in all types of validations compared with other machine learning algorithms and clinical scores (internal validation with 5037 patients from 14 US hospitals, AUROC=0·866 [95% CI 0·851-0·880]; external validation in the US population with 20 541 patients from 169 hospitals, AUROC=0·838 [0·829-0·847]; external validation in European population with 2411 patients from one hospital, AUROC=0·833 [0·812-0·853]; temporal validation with 4311 patients from one hospital, AUROC=0·884 [0·869-0·897]). In the external validation set (US population), the median AUROCs of bias evaluations covering eight subgroups were above 0·81, and the overall SMR was 0·99 (0·96-1·03). The top ten risk predictors were the minimum Glasgow Coma Scale score, total urine output, average respiratory rate, mechanical ventilation use, best state of activity, Charlson Comorbidity Index score, geriatric nutritional risk index, code status, age, and maximum blood urea nitrogen. A simplified model containing only the top 20 features (ELDER-ICU-20) had similar predictive performance to the full model. The ELDER-ICU model reliably predicts the risk of in-hospital mortality using routinely collected clinical features. The predictions could inform clinicians about patients who are at elevated risk of deterioration. Prospective validation of this model in clinical practice and a process for continuous performance monitoring and model recalibration are needed. National Institutes of Health, National Natural Science Foundation of China, National Special Health Science Program, Health Science and Technology Plan of Zhejiang Province, Fundamental Research Funds for the Central Universities, Drug Clinical Evaluate Research of Chinese Pharmaceutical Association, and National Key R&D Program of China.

Full Text