BackgroundThe Acute Physiology and Chronic Health Evaluation (APACHE) III-j model is widely used to predict mortality in Japanese intensive care units (ICUs). Although the model’s discrimination is excellent, its calibration is poor. APACHE III-j overestimates the risk of death, making its evaluation of healthcare quality inaccurate. This study aimed to improve the calibration of the model and develop a Japan Risk of Death (JROD) model for benchmarking purposes.MethodsA retrospective analysis was conducted using a national clinical registry of ICU patients in Japan. Adult patients admitted to an ICU between April 1, 2018, and March 31, 2019, were included. The APACHE III-j model was recalibrated with the following models: Model 1, predicting mortality with an offset variable for the linear predictor of the APACHE III-j model using a generalized linear model; model 2, predicting mortality with the linear predictor of the APACHE III-j model using a generalized linear model; and model 3, predicting mortality with the linear predictor of the APACHE III-j model using a hierarchical generalized additive model. Model performance was assessed with the area under the receiver operating characteristic curve (AUROC), the Brier score, and the modified Hosmer–Lemeshow test. To confirm model applicability to evaluating quality of care, funnel plots of the standardized mortality ratio and exponentially weighted moving average (EWMA) charts for mortality were drawn.ResultsIn total, 33,557 patients from 44 ICUs were included in the study population. ICU mortality was 3.8%, and hospital mortality was 8.1%. The AUROC, Brier score, and modified Hosmer–Lemeshow p value of the original model and models 1, 2, and 3 were 0.915, 0.062, and < .001; 0.915, 0.047, and < .001; 0.915, 0.047, and .002; and 0.917, 0.047, and .84, respectively. Except for model 3, the funnel plots showed overdispersion. The validity of the EWMA charts for the recalibrated models was determined by visual inspection.ConclusionsModel 3 showed good performance and can be adopted as the JROD model for monitoring quality of care in an ICU, although further investigation of the clinical validity of outlier detection is required. This update method may also be useful in other settings.