The conservativeness of standard C statistics in the prediction of clinical events.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The C statistic, also known as the concordance index (C-index), is widely used in clinical research to assess the discriminative ability of risk prediction models. Its appeal lies in its intuitive interpretation and broad applicability, particularly in fields such as cardiovascular medicine and oncology, where accurate risk stratification is essential. However, despite its popularity, the C statistic has notable limitations that can undermine its utility in both research and clinical practice. Chief among these is its inherent conservativeness: the C statistic is often insensitive to meaningful improvements in model performance when new biomarkers or risk factors are added to an already robust model. This insensitivity stems from its rank-based nature, which focuses solely on the correct ordering of risk predictions rather than the magnitude of improvement. As a result, significant advances in risk estimation may be overlooked, potentially discouraging the adoption of clinically valuable innovations. Furthermore, the C statistic does not account for calibration-the agreement between predicted and observed outcomes-or the clinical consequences of misclassification. Alternative metrics, such as the Mean Absolute Difference (MAD), Brier score and Net Reclassification Improvement (NRI), offer complementary perspectives by capturing aspects of predictive accuracy and clinical relevance that the C statistic may miss. A comprehensive evaluation of risk models should therefore integrate these alternative measures to ensure that predictive tools are both statistically robust and clinically meaningful, ultimately advancing patient care and the practice of precision medicine.

Similar Papers
  • Research Article
  • Cite Count Icon 13
  • 10.1097/cce.0000000000000580
Dynamic Risk Prediction for Hospital-Acquired Pressure Injury in Adult Critical Care Patients
  • Nov 11, 2021
  • Critical Care Explorations
  • Amy M Shui + 8 more

To develop and validate a dynamic risk prediction model to estimate the risk of developing a hospital-acquired pressure injury among adult ICU patients. ICU admission data were split into training and validation sets. With death as a competing event, both static and dynamic Fine-Gray models were developed to predict hospital-acquired pressure injury development less than 24, 72, and 168 hours postadmission. Model performance was evaluated using Wolbers' concordance index, Brier score, net reclassification improvement, and integrated discrimination improvement. We performed a retrospective cohort study of ICU patients in a tertiary care hospital located in San Francisco, CA, from November 2013 to August 2017. Data were extracted from electronic medical records of 18,019 ICU patients (age ≥ 18 yr; 21,220 encounters). Record of hospital-acquired pressure injury data was captured in our institution's incident reporting system. The information is periodically reviewed by our wound care team. Presence of hospital-acquired pressure injury during an encounter and hospital-acquired pressure injury diagnosis date were provided. The dynamic model predicting hospital-acquired pressure injury more than 24 hours postadmission, including predictors age, body mass index, lactate serum, Braden scale score, and use of vasopressor and antifungal medications, had adequate discrimination ability within 6 days from time of prediction (c = 0.73). All dynamic models produced more accurate risk estimates than static models within 26 days postadmission. There were no significant differences in Brier scores between dynamic and static models. A dynamic risk prediction model predicting hospital-acquired pressure injury development less than 24 hours postadmission in ICU patients for up to 7 days postadmission was developed and validated using a large dataset of clinical variables readily available in the electronic medical record.

  • Research Article
  • 10.1161/circ.144.suppl_1.12150
Abstract 12150: Incorporation of Natriuretic Peptides With Clinical Risk-Scores to Predict Heart Failure Among Individuals With Dysglycemia
  • Nov 16, 2021
  • Circulation
  • Matthew W Segar + 13 more

Introduction: The WATCH-DM score can predict risk of heart failure (HF) in patients with diabetes. Hypothesis: Addition of natriuretic peptide (NP) levels will improve WATCH-DM performance in individuals with dysglycemia. Methods: Adults with diabetes/pre-diabetes free of HF at baseline from 4 cohort studies (ARIC, CHS, FHS, and MESA) were included. The integer- [WATCH-DM(i)] and machine learning-based [WATCH-DM(ml)] scores were used to estimate the 5-year risk of incident HF. Discrimination was assessed by Harrell's concordance index (C-index) and calibration by the Greenwood-Nam-D'Agostino (GND) statistic. Improvement in model performance with the addition of NP-levels was assessed by C-index, Brier score, and continuous net reclassification improvement (NRI). Results: Of the 8,938 participants included, 3,554 (39.8%) had diabetes and 432 (4.8%) developed HF within 5-years. Among 5,384 (60.2%) participants with pre-diabetes, 647 (12.0%) developed incident HF. The WATCH-DM(ml) and (i) scores demonstrated high discrimination for predicting HF risk in diabetes (C-indices=0.76 and 0.69), pre-diabetes (0.83 and 0.72), and overall cohort (0.80 and 0.71), respectively, with no evidence of miscalibration (GND=P >0.10). A greater improvement in C-index was observed with the addition of NP-levels at lower WATCH-DM(i) scores with degradation of risk discrimination at higher scores (Fig. A). Calibration was also improved with addition of NP-levels at lower compared to higher WATCH-DM(i) scores (Fig. B). A greater improvement in reclassification was observed by combing WATCH-DM(i) score with selected NP-levels assessment in low (score<13) vs. high-risk (≥13) participants (NRI=0.45 vs. 0.17; p-value<0.001). Conclusions: The WATCH-DM risk score can accurately predict incident HF risk in community-based individuals with dysglycemia. The addition of NP-levels improves risk prediction among adults with low/intermediate but not high HF risk.

  • Research Article
  • 10.11817/j.issn.1672-7347.2025.250191
列线图和机器学习预测脓毒症合并深静脉血栓患者的院内死亡发生风险
  • Jun 28, 2025
  • Journal of Central South University Medical Sciences
  • 洪伟 段 + 3 more

目的全球流行病学数据显示加强监护病房(intensive care unit,ICU)中20%~30%的脓毒症患者因合并凝血病而进展为深静脉血栓(deep vein thrombosis,DVT),相关病死率达25%~40%。然而,现有预后评估工具存在局限,本研究旨在构建列线图和机器学习模型预测脓毒症合并DVT患者发生院内死亡的风险,并分析其临床适用性。方法基于重症监护医学信息数据库第4版(Medical Information Mart for Intensive Care IV,MIMIC-IV)(n=2 235)、电子重症监护协作研究数据库(eICU Collaborative Research Database,eICU-CRD)(n=1 274)和中南大学湘雅三医院加强监护病房入院数据集(定义为CSU-XYS-ICU数据集)(n=107)的多中心回顾性数据。将MIMIC-IV按7:3分为模型训练集(n=1 584)和内部验证集(n=651),其余作为外部验证集。通过最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)回归和贝叶斯信息准则(Bayesian information criterion,BIC)筛选变量,并构建列线图模型;采用极限梯度提升算法(extreme gradient boosting,XGBoost)构建机器学习模型。评估指标包括C指数、校准曲线、Brier评分、决策曲线分析(decision curve analysis,DCA)和净重分类改善指数(net reclassification improvement index,NRI)。结果通过LASSO回归和BIC筛选出年龄[比值比(odds ratio,OR)=1.02,95% CI 1.01~1.03,P<0.001]、活化部分凝血活酶时间(activated partial thromboplastin,APTT)最小值(OR=1.09,95% CI 1.08~1.11,P<0.001)、APTT最大值(OR=1.01,95% CI 1.00~1.01,P<0.001)、乳酸最大值(OR=1.56,95% CI 1.39~1.75,P<0.001)及血肌酐最大值(OR=2.03,95% CI 1.79~2.30,P<0.001)5个关键预测因子构建列线图模型。模型在内部验证(C指数=0.845,95% CI 0.811~0.879)和外部验证(eICU-CRD,C指数=0.827,95% CI 0.800~0.854;CSU-XYS-ICU,C指数=0.779,95% CI 0.687~0.871)中表现稳健,校准曲线显示预测与实际一致性高(Brier评分<0.25),DCA证实了临床获益。XGBoost模型训练集受试者操作特征(receiver operating characteristic,ROC)的曲线下面积(area under the curve,AUC)为0.982(95% CI 0.969~0.985),但外部验证效能下降(eICU-CRD,AUC=0.825,95% CI 0.817~0.861;CSU-XYS-ICU,AUC=0.766,95% CI 0.700~0.873),但仍高于临床阈值。XGBoost模型较列线图模型净获益略低(NRI=0.58)。结论列线图与XGBoost均可有效预测脓毒症合并DVT患者发生院内死亡的风险,但列线图在泛化能力及临床适用性上更具优势,其可视化评分系统为识别高危患者和实施个体化干预提供了量化工具。

  • Research Article
  • Cite Count Icon 1
  • 10.5455/medscience.2022.03.078
Comparison of Performance of Deep Survival and Cox Proportional Hazard Models: an Application on the Lung Cancer Dataset
  • Jan 1, 2022
  • Medicine Science | International Medical Journal
  • Kubra Akbas + 3 more

The goal of this study is to compare the performance of the deep survival model and the Cox regression model in an open-access Lung cancer dataset consisting of survivors and dead patients. In the study, it is applied to an open access dataset named "Lung Cancer Data" to compare the performances of the CPH and deepsurv models. The performance of the models is evaluated by C-index, AUC, and Brier score. The concordance index of the deep survival model is 0.64296, the Brier score was 0.128921, and the AUC was 0.6835. With the Cox regression model, the concordance index is calculated as 0.61445, brier score 0.1667, and AUC 0.5832. According to the Concordance index, brier score, and AUC criteria, the deep survival model performed better than the cox regression model. DeepSurv's forecasting, modeling, and predictive capabilities pave the path for future deep neural network and survival analysis research. DeepSurv has the potential to supplement traditional survival analysis methods and become the standard method for medical doctors to examine and offer individualized treatment alternatives with more research.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.1186/s12872-018-0912-3
Coronary calcium score improves the estimation for pretest probability of obstructive coronary artery disease and avoids unnecessary testing in individuals at low extreme of traditional risk factor burden: validation and comparison of CONFIRM score and genders extended model
  • Aug 29, 2018
  • BMC Cardiovascular Disorders
  • Minghui Wang + 5 more

BackgroundReliability of models for estimating pretest probability (PTP) of obstructive coronary artery disease (CAD) has not been investigated in individuals at low extreme of traditional risk factor (RF) burden. Thus, we sought to validate and compare CONFIRM score and Genders extended model (GEM) among these individuals.MethodsWe identified symptomatic individuals with 0 or 1 RF who underwent coronary calcium scan and coronary computed tomographic angiography (CCTA). Follow-up clinical data were also recorded. PTP of obstructive CAD for every individual was estimated according to CONFIRM score and GEM, respectively. Area under the receiver operating characteristic curve (AUC), integrated discrimination improvement (IDI), net reclassification improvement (NRI) and Hosmer–Lemeshow (H-L) test were used to assess the performance of models.ResultsThere were 1201 individuals with 0 RF and 2415 with 1 RF. The AUC for GEM was significantly larger than that for CONFIRM score, no matter in individuals with 0 (0.843 v.s. 0.762, p < 0.0001) or 1 (0.823 v.s. 0.752, p < 0.0001) RF. Compared to CONFIRM score, GEM demonstrated positive IDI (5% in individuals with 0 RF and 8% in individuals with 1 RF), positive NRI (41.50% in individuals with 0 RF and 40.19% in individuals with 1 RF), better prediction of clinical events and less discrepancy between observed and predicted probabilities, resulting in a significant decrease of unnecessary testing, especially in negative individuals.ConclusionIn individuals at low extreme of traditional RF burden of CAD, the addition of coronary calcium score provided a more accurate estimation for PTP and application of GEM instead of CONFIRM score could avoid unnecessary testing.

  • Research Article
  • Cite Count Icon 1
  • 10.3389/fendo.2025.1687289
TyG × waist circumference composite indicator and cardiovascular disease risk in older adults across multiple regions: a cross-sectional study
  • Oct 22, 2025
  • Frontiers in Endocrinology
  • Ying Guo + 5 more

ObjectiveTo investigate the association between the triglyceride-glucose index combined with waist circumference (TyG×WC) and cardiovascular disease (CVD) risk in older adults across multiple populations.MethodsThis study utilized data from three population sources: NHANES (2011–2018), a Chinese community cohort, and a tertiary hospital, enrolling a total of 3,443 eligible older adults. The TyG index was calculated as ln [fasting triglycerides (mg/dL) × fasting glucose (mg/dL)/2], and then multiplied by waist circumference (WC). The resulting TyG×WC values were standardized using z-score normalization and subsequently categorized into quartiles. Cardiovascular disease (CVD) status was used as the outcome variable. Multivariable logistic regression models were constructed to evaluate the association between TyG×WC and CVD risk. Trend tests and subgroup analyses by sex and region were also performed. Model performance was assessed using receiver operating characteristic (ROC) curves, the DeLong test, net reclassification improvement (NRI), integrated discrimination improvement (IDI), Brier score, and 10-fold cross-validation. Clinical utility was evaluated through decision curve analysis (DCA), while E-value analysis was used to estimate the potential impact of unmeasured confounding. The trend effect across the three populations was synthesized using random-effects meta-analysis to assess heterogeneity.ResultsA total of 3,443 participants were included: 1,684 from NHANES (48.91%), 1,263 hospitalized patients from a tertiary hospital (36.68%), and 496 from a community cohort (14.41%). Significant differences were observed across regions in age, TG, TC, LDL, HDL, FPG, ACR, HbA1c, BMI, WC, uric acid, TyG, gender, and CVD prevalence. Multivariable logistic regression indicated a significant positive association between the TyG×WC index and CVD risk. After adjusting for confounders, participants in Q3 and Q4 had significantly higher CVD risk (OR = 1.94 and 2.47, respectively; both P<0.001), with a significant linear trend (P for trend = 2.44×10-19). Subgroup analyses showed a stronger predictive effect in females (Q4 vs Q1: OR = 2.34, 95% CI: 1.75–3.14) and in the NHANES population (Q4 vs Q1: OR = 4.64, 95% CI: 3.19–6.85). Heterogeneity analysis revealed no significant differences across regions (I²=30.3%, P = 0.238). Regarding model performance, the extended model including TyG×WC showed an improvement in AUC (from 0.692 to 0.701, DeLong P = 0.038), along with significant improvements in NRI (0.222, P<0.001), IDI (0.0215, P<0.001), and favorable DCA results. The E-value analysis indicated robust results against unmeasured confounding (point estimate E-value = 4.27; lower bound E-value = 3.29).ConclusionThe TyG×WC composite indicator is an independent predictor of CVD risk, with more pronounced effects observed in women and the general population. The association between TyG×WC and CVD risk demonstrates a stable and progressive trend across quartiles and is consistent across different populations. The inclusion of TyG×WC enhances predictive accuracy (AUC, NRI, IDI) and clinical utility (DCA), suggesting strong generalizability and practical application. This indicator may serve as a valuable tool for screening high-risk individuals and guiding CVD prevention strategies.

  • Research Article
  • Cite Count Icon 46
  • 10.1097/shk.0000000000000892
Age Shock Index is Superior to Shock Index and Modified Shock Index for Predicting Long-Term Prognosis in Acute Myocardial Infarction.
  • Nov 1, 2017
  • Shock
  • Tongtong Yu + 5 more

Shock index (SI) has been reported to help us predict adverse prognosis in patients with acute myocardial infarction (AMI) undergoing percutaneous coronary intervention (PCI). However, the prognostic value of age SI and modified shock index (MSI) in AMI undergoing PCI is unknown. Moreover, the prognostic performance of admission age SI is not compared with SI, MSI, and the Global Registry of Acute Coronary Events (GRACE) risk score. One thousand eight hundred sixty-four AMI patients undergoing PCI were analyzed in a retrospective cohort study. Clinical endpoint was all-cause mortality. The predictive performance of new models was assessed by C-statistic, Hosmer-Lemeshow test, Nagelkerke-R, Brier scores, integrated discrimination improvement (IDI), and net reclassification improvement (NRI). Multivariate analysis showed that higher age SI and MSI were both associated with a higher rate of all-cause mortality [age SI: hazard ratios (HR) = 1.025, 95% CI = 1.010-1.040, P = 0.001; MSI: HR = 2.902, 95% CI = 1.180-7.137, P = 0.020]. The prognostic performance of admission age SI was similar to the GRACE systems for predicting all-cause mortality (C-statistic: z = 0.437, P = 0.662; IDI: -0.005, P = 0.474; NRI: -0.028, P = 0.257), but better than admission SI (C-statistic: z = 3.944, P < 0.001; IDI: 0.012, P = 0.016; NRI: 0.472, P < 0.001) and admission MSI (C-statistic: z = 3.214, P = 0.001; IDI: 0.011, P = 0.001; NRI: 0.561, P < 0.001). Age SI alone can identify patients at high risk of death in AMI patients undergoing PCI. It is similar to GRACE but better than SI and MSI for predicting all-cause mortality. However, age SI is easier to calculate than GRACE.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.3389/fonc.2020.00143
Latent Risk Intrahepatic Cholangiocarcinoma Susceptible to Adjuvant Treatment After Resection: A Clinical Deep Learning Approach
  • Feb 19, 2020
  • Frontiers in Oncology
  • Seogsong Jeong + 13 more

Background: Artificial Intelligence (AI) frameworks have emerged as a novel approach in medicine. However, information regarding its applicability and effectiveness in a clinical prognostic factor setting remains unclear.Methods: The AI framework was derived from a pooled dataset of intrahepatic cholangiocarcinoma (ICC) patients from three clinical centers (n = 1,421) by applying the TensorFlow deep learning algorithm to Cox-indicated pathologic (four), serologic (six), and etiologic (two) factors; this algorithm was validated using a dataset of ICC patients from an independent clinical center (n = 234). The model was compared to the commonly used staging system (American Joint Committee on Cancer; AJCC) and methodology (Cox regression) by evaluating the brier score (BS), integrated discrimination improvement (IDI), net reclassification improvement (NRI), and area under curve (AUC) values.Results: The framework (BS, 0.17; AUC, 0.78) was found to be more accurate than the AJCC stage (BS, 0.48; AUC, 0.60; IDI, 0.29; NRI, 11.85; P < 0.001) and the Cox model (BS, 0.49; AUC, 0.70; IDI, 0.46; NRI, 46.11; P < 0.001). Furthermore, hazard ratios greater than three were identified in both overall survival (HR; 3.190; 95% confidence interval [CI], 2.150–4.733; P < 0.001) and disease-free survival (HR, 3.559; 95% CI, 2.500–5.067; P < 0.001) between latent risk and stable groups in validation. In addition, the latent risk subgroup was found to be significantly benefited from adjuvant treatment (HR, 0.459; 95% CI, 0.360–0.586; P < 0.001).Conclusions: The AI framework seems promising in the prognostic estimation and stratification of susceptible individuals for adjuvant treatment in patients with ICC after resection. Future prospective validations are needed for the framework to be applied in clinical practice.

  • Abstract
  • Cite Count Icon 2
  • 10.1016/j.chest.2022.08.2121
AUTOMATED MACHINE LEARNING WITH AUTOGLUON TO PREDICT POSTOPERATIVE PNEUMONIA USING THE AMERICAN COLLEGE OF SURGEONS’ NATIONAL SURGICAL QUALITY IMPROVEMENT PROGRAM DATABASE
  • Oct 1, 2022
  • Chest
  • Kenneth Brill + 12 more

AUTOMATED MACHINE LEARNING WITH AUTOGLUON TO PREDICT POSTOPERATIVE PNEUMONIA USING THE AMERICAN COLLEGE OF SURGEONS’ NATIONAL SURGICAL QUALITY IMPROVEMENT PROGRAM DATABASE

  • Front Matter
  • Cite Count Icon 1
  • 10.1053/j.ajkd.2011.11.011
Genetic Risk Prediction for CKD: A Journey of a Thousand Miles
  • Dec 14, 2011
  • American Journal of Kidney Diseases
  • Jeffrey B Kopp + 1 more

Genetic Risk Prediction for CKD: A Journey of a Thousand Miles

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s11739-024-03672-x
Machine learning predictions of the adverse events of different treatments in patients with ischemic left ventricular systolic dysfunction.
  • Jun 14, 2024
  • Internal and emergency medicine
  • Wenjie Chen + 2 more

This study aimed to develop several new machine learning models based on hibernating myocardium to predict the major adverse cardiac events(MACE) of ischemic left ventricular systolic dysfunction(LVSD) patients receiving either percutaneous coronary intervention(PCI) or optimal medical therapy(OMT). This study included 329 LVSD patients, who were randomly assigned to the training or validation cohort. Least absolute shrinkage and selection operator(LASSO) regression was used to identify variables associated with MACE. Subsequently, various machine learning models were established. Model performance was compared using receiver operating characteristic(ROC) curves, the Brier score(BS), and the concordance index(C-index). A total of 329 LVSD patients were retrospectively enrolled between January 2016 and December 2021. Utilizing LASSO regression analysis, five factors were selected. Based on these factors, RSF, GBM, XGBoost, Cox, and DeepSurv models were constructed. In the development and validation cohorts, the C-indices were 0.888 vs. 0.955 (RSF). The RSF model (0.991 vs. 0.982 vs. 0.980) had the highest area under the ROC curve (AUC) compared with the other models. The BS (0.077 vs. 0.095vs. 0.077) of RSF model were less than 0.25 at 12, 18, and 24months. This study developed a novel predictive model based on RSF to predict MACE in LVSD patients who underwent either PCI or OMT.

  • Abstract
  • 10.1182/blood-2023-185881
Novel Causal Inference Method Estimates Treatment Effects of Contemporary Drugs in a Global Cohort of Patients with Relapsed and Refractory Mature T-Cell and NK-Cell Neoplasms
  • Nov 2, 2023
  • Blood
  • Min Ji Koh + 49 more

Novel Causal Inference Method Estimates Treatment Effects of Contemporary Drugs in a Global Cohort of Patients with Relapsed and Refractory Mature T-Cell and NK-Cell Neoplasms

  • Research Article
  • 10.1093/ndt/gfaf116.058
#1198 Deep learning survival model to predict renal outcome for chronic kidney disease (CKD) patients: a retrospective cohort study
  • Oct 21, 2025
  • Nephrology Dialysis Transplantation
  • Ka Chun Leung + 1 more

Background and Aims Chronic kidney disease (CKD) affects over 10% of the global population, imposing significant health and economic burdens [1]. Predictive models like the Kidney Failure Risk Equation (KFRE) aid in estimating the 2- and 5-year probability of kidney failure but often fall short in diverse populations, including those in South-East Asia [2, 3]. This study aims to develop a patient-centric, artificial intelligence (AI)-driven predictive model using South-East Asian data to enhance the prediction of renal replacement therapy (RRT) initiation and mortality in CKD patients with an estimated glomerular filtration rate (eGFR) &amp;lt;60 ml/min/1.73 m². Method A multi-centre, retrospective cohort study was conducted across three acute hospitals in Hong Kong. Data from 2009 to 2023 were retrieved from the Clinical Data Analysis and Reporting System (CDARS), including demographics, biochemistry, and ICD-10 codes. Patients with eGFR &amp;lt;60 ml/min/1.73 m² on two separate tests at least 3 months apart were included, excluding those with prior renal replacement therapy or transplantation. Missing data were imputed using Multivariable Imputation by Chained Equations (MICE), and data were preprocessed with one-hot encoding, log transformations, and scaling. Fourteen AI survival models incorporating deep learning techniques (ANN and LSTM) were trained using balanced datasets generated via synthetic oversampling methods and their predictions were ensembled by averaging. Model performance was evaluated with the concordance index, Brier score and index of prediction accuracy (IPA) calculated by the Brier score [4]. Results Data from 34253 CKD patients were analyzed (28866 and 5387 patients in the development and validation cohorts respectively). Our model showed excellent performance at 1, 2 and 3 years (Fig. 1); while the performance was less optimal in 4 and 5 years. KFRE appeared to show better performance than our model in predicting renal failure at 5 years (Figs 2 and 3). The ensembled survival models demonstrated robust predictive capabilities. For RRT initiation, the concordance index was 0.98 (95% CI: 0.984–0.985, p &amp;lt; 0.0001) with a Brier score of 0.03 (95% CI: 0.036–0.037, p &amp;lt; 0.0001). For all-cause mortality, the concordance index was 0.82 (95% CI: 0.822–0.824, p &amp;lt; 0.0001) with a Brier score of 0.09 (95% CI: 0.090–0.091, p &amp;lt; 0.0001). Calibration plots confirmed the accuracy of predictions across all risk quantiles. Conclusion AI-driven models show good potential to predict CKD progression and mortality with high accuracy, offering a personalized approach to patient management. The integration of such tools into clinical practice may optimize care pathways, improve patient outcomes, and reduce the burden on healthcare systems. Further validation across broader populations is warranted to enhance generalizability.

  • Research Article
  • 10.3389/fmed.2025.1655302
Personalized prediction model for scar response after radionuclide therapy: development and validation in a Chinese cohort
  • Oct 13, 2025
  • Frontiers in Medicine
  • Jinzhao Su + 6 more

BackgroundScarring represents a persistent clinical and psychosocial challenge, with considerable variability in treatment response among patients. While both clinical and morphologic factors can influence outcomes, robust, individualized prediction of scar treatment efficacy remains elusive.ObjectiveTo develop and validate an integrated predictive model for scar treatment outcomes using a combination of clinical and image-derived features in a Chinese cohort, and to translate this model into a web-based calculator for practical clinical application. This model requires validation in other ethnicities.MethodsWe retrospectively analyzed 117 Chinese patients with scars treated at a single center, dividing them into a training (n = 83) and validation cohort (n = 34). Clinical data (including age, scar height) and quantitative features extracted from standardized scar photographs (solidity and mean saturation [S_mean]) were used to construct clinical, image-based, and combined predictive models. Feature selection was performed via LASSO regression, and models were developed using multivariate logistic regression. Model performance was evaluated using area under the receiver operating characteristic curve (AUC), calibration metrics (Brier score, log loss, HL test), and decision curve analysis (DCA). Net reclassification improvement (NRI) and integrated discrimination improvement (IDI) were calculated. A user-friendly web calculator was subsequently developed.ResultsScar height and age (clinical factors) as well as solidity and S_mean (image-derived metrics) were identified as independent predictors of poor treatment outcome. The combined model demonstrated superior discrimination (AUC 0.970 [training], 0.908 [test]), calibration, and clinical utility compared to clinical or image-based models alone. Calibration curves and metrics indicated excellent agreement between predicted and observed probabilities for the combined model. DCA, NRI, and IDI analyses further highlighted the incremental value and net benefit of the integrated approach. A web-based calculator was developed to enable individualized outcome prediction and support clinical decision-making.ConclusionIntegration of clinical and image-derived features enables robust, individualized prediction of scar treatment outcomes in this Chinese cohort. Our validated combined model, accessible via an easy-to-use web-based calculator, may enhance treatment planning, risk stratification, and patient counseling in scar management. Validation in diverse ethnic populations is essential.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/bioengineering12050511
Development, Validation, and Deployment of a Time-Dependent Machine Learning Model for Predicting One-Year Mortality Risk in Critically Ill Patients with Heart Failure.
  • May 12, 2025
  • Bioengineering (Basel, Switzerland)
  • Jiuyi Wang + 5 more

Background: Heart failure (HF) ranks among the foremost causes of mortality globally, exhibiting particularly high prevalence and significant impact within intensive care units (ICUs). This study sought to develop, validate, and deploy a time-dependent machine learning model aimed at predicting the one-year all-cause mortality risk in ICU patients diagnosed with HF, thereby facilitating precise prognostic evaluation and risk stratification. Methods: This study encompassed a cohort of 8960 ICU patients with HF sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database (version 3.1). This latest version of the database added data from 2020 to 2022 on the basis of version 2.2 (covering data from 2008 to 2019); therefore, data spanning 2008 to 2019 (n = 5748) were designated for the training set, while data from 2020 to 2022 (n = 3212) were reserved for the test set. The primary endpoint of interest was one-year all-cause mortality. Least Absolute Shrinkage and Selection Operator (LASSO) regression was employed to select predictive features from an initial pool of 64 candidate variables (including demographic characteristics, vital signs, comorbidities and complications, therapeutic interventions, routine laboratory data, and disease severity scores). Four predictive models were developed and compared: Cox proportional hazards, random survival forest (RSF), Cox proportional hazards deep neural network (DeepSurv), and eXtreme Gradient Boosting (XGBoost). Model performance was assessed using the concordance index (C-index) and Brier score, with model interpretability addressed through SHapley Additive exPlanations (SHAP) and time-dependent Survival SHapley Additive exPlanations (SurvSHAP(t)). Results: This study revealed a one-year mortality rate of 46.1% within the population under investigation. In the training set, LASSO effectively identified 24 features in the model. In the test set, the XGBoost model exhibited superior predictive performance, as evidenced by a C-index of 0.772 and a Brier score of 0.161, outperforming the Cox model (C-index: 0.740, Brier score: 0.175), the RSF model (C-index: 0.747, Brier score: 0.178), and the DeepSur model (C-index: 0.723, Brier score: 0.183). Decision curve analysis validated the clinical utility of the XGBoost model across a broad spectrum of risk thresholds. Feature importance analysis identified the red cell distribution width-to-albumin ratio (RAR), Charlson Comorbidity Index, Simplified Acute Physiology Score II (SAPS II), Acute Physiology Score III (APS III), and the age-bilirubin-INR-creatinine (ABIC) score as the top five predictive factors. Consequently, an online risk prediction tool based on this model has been developed and is publicly accessible. Conclusions: The time-dependent XGBoost model demonstrated robust predictive capability in evaluating the one-year all-cause mortality risk in critically ill HF patients. This model offered a useful tool for early risk identification and supported timely interventions.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.