Predicting hospital mortality for patients in the intensive care unit: a comparison of artificial neural networks with logistic regression models.

Gilles Clermont,Derek C Angus,Walter T Linde-Zwirble,Stephen M Dirusso,Martin Griffin

doi:10.1097/00003246-200102000-00012

Abstract

Logistic regression (LR), commonly used for hospital mortality prediction, has limitations. Artificial neural networks (ANNs) have been proposed as an alternative. We compared the performance of these approaches by using stepwise reductions in sample size. Prospective cohort study. Seven intensive care units (ICU) at one tertiary care center. Patients were 1,647 ICU admissions for whom first-day Acute Physiology and Chronic Health Evaluation III variables were collected. None. We constructed LR and ANN models on a random set of 1,200 admissions (development set) and used the remaining 447 as the validation set. We repeated model construction on progressively smaller development sets (800, 400, and 200 admissions) and retested on the original validation set (n = 447). For each development set, we constructed models from two LR and two ANN architectures, organizing the independent variables differently. With the 1,200-admission development set, all models had good fit and discrimination on the validation set, where fit was assessed by the Hosmer-Lemeshow C statistic (range, 10.6-15.3; p > or = .05) and standardized mortality ratio (SMR) (range, 0.93 [95% confidence interval, 0.79-1.15] to 1.09 [95% confidence interval, 0.89-1.38]), and discrimination was assessed by the area under the receiver operating characteristic curve (range, 0.80-0.84). As development set sample size decreased, model performance on the validation set deteriorated rapidly, although the ANNs retained marginally better fit at 800 (best C statistic was 26.3 [p = .0009] and 13.1 [p = .11] for the LR and ANN models). Below 800, fit was poor with both approaches, with high C statistics (ranging from 22.8 [p <.004] to 633 [p <.0001]) and highly biased SMRs (seven of the eight models below 800 had SMRs of <0.85, with an upper confidence interval of <1). Discrimination ranged from 0.74 to 0.84 below 800. When sample size is adequate, LR and ANN models have similar performance. However, development sets of < or = 800 were generally inadequate. This is concerning, given typical sample sizes used for individual ICU mortality prediction.

Full Text