Development of machine learning models with explainable AI for frailty risk prediction and their web-based application in community public health
BackgroundFrailty is a public health concern linked to falls, disability, and mortality. Early screening and tailored interventions can mitigate adverse outcomes, but community settings require tools that are accurate and explainable. Korea is entering a super-aged phase, yet few approaches have used nationally representative survey data.ObjectiveThis study aimed to identify key predictors of frailty risk using the K-FRAIL scale using explainable machine learning (ML), based on data from the 2023 National Survey of Older Koreans (NSOK). It also sought to develop and internally validate prediction models. To demonstrate the potential applicability of these models in community public health and clinical practice, a web-based application was implemented.MethodsData from 10,078 older adults were analyzed, with frailty defined by the K-FRAIL scale (robust = 0, pre-frail = 1–2, and frail = 3–5). A total of 132 candidate variables were constructed through selection and derivation. Using CatBoost with out-of-fold (OOF) SHapley Additive exPlanations (SHAP, a game-theoretic approach to quantify feature contributions), 15 key predictors were identified and applied across 10 algorithms under nested cross-validation (CV). Model performance was evaluated using receiver operating characteristic–area under the curve (ROC-AUC), precision–recall area under the curve (PR-AUC), F1-score, balanced accuracy, and the Brier score. To assess feasibility, a single-page bilingual web application was developed, integrating the CatBoost inference pipeline for offline use.ResultsSHAP analysis identified depression score, age, instrumental activities of daily living (IADL) count, sleep quality, and cognition as the leading predictors, followed by smartphone use, number of medications, province, driving status, hospital use, physical activity, osteoporosis, eating alone, digital adaptation difficulty, and sex, yielding 15 key predictors across the mental, functional, lifestyle, social, and digital domains. Using these predictors, boosting models outperformed other algorithms, with CatBoost achieving the best performance (ROC-AUC = 0.813 ± 0.014; PR-AUC = 0.748 ± 0.019).ConclusionAn explainable machine learning model with strong discrimination performance and adequate calibration was developed, accompanied by a lightweight web application for potential use in community and clinical settings. However, external validation, recalibration, and subgroup fairness assessments are needed to ensure generalizability and clinical adoption.
- # Precision–recall Area Under The Curve
- # Receiver Operating Characteristic–area Under The Curve
- # Instrumental Activities Of Daily Living
- # Development Of Machine Learning Models
- # Community Public Health
- # SHapley Additive exPlanations
- # Explainable Machine Learning
- # Web-based Application
- # Brier Score
- # Adequate Calibration
- Research Article
- 10.1136/bmjopen-2025-108527
- Nov 1, 2025
- BMJ Open
ObjectivesMethotrexate (MTX) effectively controls rheumatoid arthritis (RA) but often leads to side effects (SE) such as gastrointestinal (GI) issues, liver toxicity and bone marrow suppression. To develop clinically interpretable machine learning (ML) models that accurately predict MTX-related SE in patients with RA taking MTX. The aim was to enhance predictive accuracy and to identify patient-specific risk factors using explainable artificial intelligence (XAI), thereby enabling transparent clinical interpretation. We specifically sought to address the unmet need for individualised risk stratification using real-world, multicentre observational data.DesignRetrospective case-control study.SettingAcross 23 rheumatology clinics in South Korea, based on data from a nationwide multicentre cohort.ParticipantsA total of 5077 patients with RA were initially enrolled from the Korean Observational Study Network for Arthritis. After excluding those with missing clinical, demographic or prescription data and those not receiving MTX, 2375 patients remained eligible. Among these, 1654 and 1218 patients were included in the overall SE and GI SE analysis groups, respectively, after 1:1 propensity score matching. All patients were aged ≥18 years and met the 1987 American College of Rheumatology classification criteria.Primary and secondary outcome measuresThe primary outcome was the presence of SE in patients with RA taking MTX, categorised into overall SE and GI SE, based on standardised patient questionnaires and clinical assessments. The secondary outcome was the identification of key predictors using SHapley Additive exPlanations (SHAP) to enhance the interpretability of ML predictions.ResultsAmong six ML classifiers, extreme gradient boosting demonstrated the highest performance in predicting overall SE (area under the curve (AUC) 0.781, F1 score 0.672, area under the precision-recall curve (AUPRC) 0.757) and GI SE (AUC 0.701, F1 score 0.690, AUPRC 0.670). SHAP analysis identified key predictive features including age, physician visual analogue scale score, alanine aminotransferase, Health Assessment Questionnaire score, celecoxib use and drug adherence. Logistic regression confirmed statistical significance for multiple variables (eg, OR 4.63; 95% CI 1.41 to 20.90 for non-adherence >30 days; OR 1.45; 95% CI 1.14 to 1.85 for celecoxib use). DeLong’s test indicated that boosting models significantly outperformed support vector machine (p<0.001).ConclusionsInterpretable ML models using real-world clinical data can accurately predict SE in patients with RA taking MTX. These models may facilitate early identification of high-risk individuals and inform personalised treatment strategies. Integration into clinical decision support systems could improve MTX safety monitoring. Further prospective validation in external cohorts is warranted.
- Research Article
32
- 10.1001/jamanetworkopen.2022.12930
- May 25, 2022
- JAMA Network Open
Cytoreductive surgery (CRS) is one of the most complex operations in surgical oncology with significant morbidity, and improved risk prediction tools are critically needed. Machine learning models can potentially overcome the limitations of traditional multiple logistic regression (MLR) models and provide accurate risk estimates. To develop and validate an explainable machine learning model for predicting major postoperative complications in patients undergoing CRS. This prognostic study used patient data from tertiary care hospitals with expertise in CRS included in the US Hyperthermic Intraperitoneal Chemotherapy Collaborative Database between 1998 and 2018. Information from 147 variables was extracted to predict the risk of a major complication. An ensemble-based machine learning (gradient-boosting) model was optimized on 80% of the sample with subsequent validation on a 20% holdout data set. The machine learning model was compared with traditional MLR models. The artificial intelligence SHAP (Shapley additive explanations) method was used for interpretation of patient- and cohort-level risk estimates and interactions to define novel surgical risk phenotypes. Data were analyzed between November 2019 and August 2021. Cytoreductive surgery. Area under the receiver operating characteristics (AUROC); area under the precision recall curve (AUPRC). Data from a total 2372 patients were included in model development (mean age, 55 years [range, 11-95 years]; 1366 [57.6%] women). The optimized machine learning model achieved high discrimination (AUROC: mean cross-validation, 0.75 [range, 0.73-0.81]; test, 0.74) and precision (AUPRC: mean cross-validation, 0.50 [range, 0.46-0.58]; test, 0.42). Compared with the optimized machine learning model, the published MLR model performed worse (test AUROC and AUPRC: 0.54 and 0.18, respectively). Higher volume of estimated blood loss, having pelvic peritonectomy, and longer operative time were the top 3 contributors to the high likelihood of major complications. SHAP dependence plots demonstrated insightful nonlinear interactive associations between predictors and major complications. For instance, high estimated blood loss (ie, above 500 mL) was only detrimental when operative time exceeded 9 hours. Unsupervised clustering of patients based on similarity of sources of risk allowed identification of 6 distinct surgical risk phenotypes. In this prognostic study using data from patients undergoing CRS, an optimized machine learning model demonstrated a superior ability to predict individual- and cohort-level risk of major complications vs traditional methods. Using the SHAP method, 6 distinct surgical phenotypes were identified based on sources of risk of major complications.
- Research Article
- 10.1093/bjd/ljaf085.030
- Jun 27, 2025
- British Journal of Dermatology
Evidence-based precision medicine strategies do not currently exist to guide the choice of biologics in the treatment of psoriasis. As a result, a costly and arduous trial-and-error approach is often adopted. Artificial intelligence has the potential to improve personalization through the prediction of treatment outcomes using real-world data, such as that within the British Association of Dermatologists Biologics and Immunomodulators Register (BADBIR). We aimed to develop an explainable machine learning (ML) model to predict biologic drug discontinuation in a biologic-naive psoriasis cohort using BADBIR data. BADBIR data (2007–2024) were engineered to enable readability. Adult biologic-naive patients across all biologic cohorts with &gt; 6 months of follow-up data were included. Recruitment centres representing 10% of the overall cohort were randomly separated for external validation (model testing). The residual cohort was then randomly split for model training (80%) and internal validation (20%, for hyperparameter tuning). Random forest modelling was applied for imputation of missing data. Only clinical data at baseline prior to biologic initiation were used for model training to enhance future clinical utilization. The performance of several ML (XG-Boost, AdaBoost, random forest) and deep learning (simple and recurrent neural networks) algorithms was evaluated. External validation was performed with a cross-validation leave-group-out approach of individual recruitment centres. SHAP (SHapley Additive exPlanations) and permutation feature importance values were generated to understand model predictions. In total, 10 806 patients were included, in the cohorts for training (n = 7722), internal validation (n = 1930) and external validation (for final model testing: nine centres, n = 1154). Most patients (n = 7290, 67%) discontinued initial biologic therapy within their follow-up duration (median 6.6 years). Within the discontinuation cohort, adalimumab (originator and biosimilars, 57%) was most prescribed. Higher proportions of female patients (43% vs. 37%) and patients with psoriatic arthritis (21% vs. 17%) and scalp psoriasis (59% vs. 51%) were noted in the discontinuation vs. the continuation cohort, respectively. AdaBoost, an ensemble ML model, outperformed other evaluated models with regards to area under the receiver operating characteristic curve (AUROC). Model testing predicted discontinuation of biologic therapy with (mean, 95% confidence interval) precision 0.85 (0.83–0.88), recall 0.80 (0.78–0.83), F1 score 0.82, AUROC 0.76 (0.71–0.78) and area under the precision recall curve (AUPRC) 0.83 (0.81–0.86). Performance metrics following testing with cross-validation [mean (SD)] were precision 0.79 (0.09), recall 0.69 (0.2), F1 score 0.74 (0.16), AUROC 0.71 (0.06) and AUPRC 0.75 (0.11). The features contributing most significantly to model performance were initial biologic drug, baseline Psoriasis Area and Severity Index, patient age, recruitment centre and baseline white cell count. In conclusion, AdaBoost represents an explainable, ML model with potential clinical utility to predict treatment outcomes of patients with psoriasis using real-world registry data. Future work will investigate discontinuation risk across a range of individual biologic therapies.
- Research Article
- 10.1093/bjd/ljaf085.200
- Jun 27, 2025
- British Journal of Dermatology
Evidence-based precision medicine strategies do not currently exist to guide the choice of biologics in the treatment of psoriasis. As a result, a costly and arduous trial-and-error approach is often adopted. Artificial intelligence has the potential to improve personalization through the prediction of treatment outcomes using real-world data, such as that within the British Association of Dermatologists Biologics and Immunomodulators Register (BADBIR). We aimed to develop an explainable machine learning (ML) model to predict biologic drug discontinuation in a biologic-naive psoriasis cohort using BADBIR data. BADBIR data (2007–2024) were engineered to enable readability. Adult biologic-naive patients across all biologic cohorts with &gt; 6 months of follow-up data were included. Recruitment centres representing 10% of the overall cohort were randomly separated for external validation (model testing). The residual cohort was then randomly split for model training (80%) and internal validation (20%, for hyperparameter tuning). Random forest modelling was applied for imputation of missing data. Only clinical data at baseline prior to biologic initiation were used for model training to enhance future clinical utilization. The performance of several ML (XGBoost, AdaBoost, random forest) and deep learning (simple and recurrent neural networks) algorithms was evaluated. External validation was performed with a cross-validation leave-group-out approach of individual recruitment centres. SHAP (SHapley Additive exPlanations) and permutation feature importance values were generated to understand model predictions. In total, 10 806 patients were included, in the cohorts for training (n = 7722), internal validation (n = 1930) and external validation (for final model testing: nine centres, n = 1154). Most patients (n = 7290, 67%) discontinued initial biologic therapy within their follow-up duration (median 6.6 years). Within the discontinuation cohort, adalimumab (originator and biosimilars, 57%) was most prescribed. Higher proportions of female patients (43% vs. 37%) and patients with psoriatic arthritis (21% vs. 17%) and scalp psoriasis (59% vs. 51%) were noted in the discontinuation vs. the continuation cohort, respectively. AdaBoost, an ensemble ML model, outperformed other evaluated models with regards to area under the receiver operating characteristic curve (AUROC). Model testing predicted discontinuation of biologic therapy with (mean, 95% CI) precision 0.85 (0.83–0.88), recall 0.80 (0.78–0.83), F1 score 0.82, AUROC 0.76 (0.71–0.78) and area under the precision recall curve (AUPRC) 0.83 (0.81–0.86). Performance metrics following testing with cross-validation [mean (SD)] were precision 0.79 (0.09), recall 0.69 (0.2), F1 score 0.74 (0.16), AUROC 0.71 (0.06) and AUPRC 0.75 (0.11). The features contributing most significantly to model performance were initial biologic drug, baseline Psoriasis Area and Severity Index, patient age, recruitment centre and baseline white cell count. In conclusion, AdaBoost represents an explainable, ML model with potential clinical utility to predict treatment outcomes of patients with psoriasis using real-world registry data. Future work will investigate discontinuation risk across a range of individual biologic therapies.
- Research Article
8
- 10.1016/j.xops.2024.100584
- Jul 20, 2024
- Ophthalmology Science
Predicting Choroidal Nevus Transformation to Melanoma Using Machine Learning
- Research Article
- 10.31083/j.rcm2506203
- May 31, 2024
- Reviews in cardiovascular medicine
Readmission of elderly angina patients has become a serious problem, with a dearth of available prediction tools for readmission assessment. The objective of this study was to develop a machine learning (ML) model that can predict 180-day all-cause readmission for elderly angina patients. The clinical data for elderly angina patients was retrospectively collected. Five ML algorithms were used to develop prediction models. Area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), and the Brier score were applied to assess predictive performance. Analysis by Shapley additive explanations (SHAP) was performed to evaluate the contribution of each variable. A total of 1502 elderly angina patients (45.74% female) were enrolled in the study. The extreme gradient boosting (XGB) model showed good predictive performance for 180-day readmission (AUROC = 0.89; AUPRC = 0.91; Brier score = 0.21). SHAP analysis revealed that the number of medications, hematocrit, and chronic obstructive pulmonary disease were important variables associated with 180-day readmission. An ML model can accurately identify elderly angina patients with a high risk of 180-day readmission. The model used to identify individual risk factors can also serve to remind clinicians of appropriate interventions that may help to prevent the readmission of patients.
- Research Article
2
- 10.3389/fendo.2024.1292346
- Jan 25, 2024
- Frontiers in Endocrinology
Insulin plays a central role in the regulation of energy and glucose homeostasis, and insulin resistance (IR) is widely considered as the "common soil" of a cluster of cardiometabolic disorders. Assessment of insulin sensitivity is very important in preventing and treating IR-related disease. This study aims to develop and validate machine learning (ML)-augmented algorithms for insulin sensitivity assessment in the community and primary care settings. We analyzed the data of 9358 participants over 40 years old who participated in the population-based cohort of the Hubei center of the REACTION study (Risk Evaluation of Cancers in Chinese Diabetic Individuals). Three non-ensemble algorithms and four ensemble algorithms were used to develop the models with 70 non-laboratory variables for the community and 87 (70 non-laboratory and 17 laboratory) variables for the primary care settings to screen the classifier of the state-of-the-art. The models with the best performance were further streamlined using top-ranked 5, 8, 10, 13, 15, and 20 features. Performances of these ML models were evaluated using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPR), and the Brier score. The Shapley additive explanation (SHAP) analysis was employed to evaluate the importance of features and interpret the models. The LightGBM models developed for the community (AUROC 0.794, AUPR 0.575, Brier score 0.145) and primary care settings (AUROC 0.867, AUPR 0.705, Brier score 0.119) achieved higher performance than the models constructed by the other six algorithms. The streamlined LightGBM models for the community (AUROC 0.791, AUPR 0.563, Brier score 0.146) and primary care settings (AUROC 0.863, AUPR 0.692, Brier score 0.124) using the 20 top-ranked variables also showed excellent performance. SHAP analysis indicated that the top-ranked features included fasting plasma glucose (FPG), waist circumference (WC), body mass index (BMI), triglycerides (TG), gender, waist-to-height ratio (WHtR), the number of daughters born, resting pulse rate (RPR), etc. The ML models using the LightGBM algorithm are efficient to predict insulin sensitivity in the community and primary care settings accurately and might potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.
- Research Article
1
- 10.1200/cci.23.00247
- Apr 1, 2024
- JCO clinical cancer informatics
Preoperative prediction of postoperative complications (PCs) in inpatients with cancer is challenging. We developed an explainable machine learning (ML) model to predict PCs in a heterogenous population of inpatients with cancer undergoing same-hospitalization major operations. Consecutive inpatients who underwent same-hospitalization operations from December 2017 to June 2021 at a single institution were retrospectively reviewed. The ML model was developed and tested using electronic health record (EHR) data to predict 30-day PCs for patients with Clavien-Dindo grade 3 or higher (CD 3+) per the CD classification system. Model performance was assessed using area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), and calibration plots. Model explanation was performed using the Shapley additive explanations (SHAP) method at cohort and individual operation levels. A total of 988 operations in 827 inpatients were included. The ML model was trained using 788 operations and tested using a holdout set of 200 operations. The CD 3+ complication rates were 28.6% and 27.5% in the training and holdout test sets, respectively. Training and holdout test sets' model performance in predicting CD 3+ complications yielded an AUROC of 0.77 and 0.73 and an AUPRC of 0.56 and 0.52, respectively. Calibration plots demonstrated good reliability. The SHAP method identified features and the contributions of the features to the risk of PCs. We trained and tested an explainable ML model to predict the risk of developing PCs in patients with cancer. Using patient-specific EHR data, the ML model accurately discriminated the risk of developing CD 3+ complications and displayed top features at the individual operation and cohort level.
- Research Article
5
- 10.1016/j.cmpb.2024.108561
- Mar 1, 2025
- Computer methods and programs in biomedicine
Machine learning-based predictive models for perioperative major adverse cardiovascular events in patients with stable coronary artery disease undergoing noncardiac surgery.
- Research Article
- 10.1161/circ.152.suppl_3.4342997
- Nov 4, 2025
- Circulation
Background: Cardiovascular disease (CVD) is a global health concern. Traditional models often miss nonlinear dependencies among physiological and behavioral factors. Transformer-based deep learning can capture complex patterns in structured health data. We hypothesized that such a model, trained on large-scale check-up records, would improve long-term CVD risk prediction. Methods: Using annual health check-up data from Toyama Prefecture, Japan (n = 100,056; 2010–2024), we excluded individuals with baseline CVD. The outcome was time to incident CVD over 10 years, modeled as right-censored survival data. For external validation, we used data from Kanazawa City (n = 79,756). The Transformer model was trained using anthropometric, laboratory, and self-reported lifestyle data. Benchmark models included Cox regression, XGBoost survival embeddings, multilayer perceptron (MLP), the Framingham Risk Score (FRS), and the Hisayama Risk Score (HRS). Model performance was evaluated using C-index, time-dependent area under the curve (AUC), and precision-recall AUC (PR-AUC). Interpretability was assessed using SHapley Additive exPlanations (SHAP) and a Feature Attention Network (FAN), which visualizes directional relationships via Transformer attention weights. Attention was computed across all features, but only the top 12 ranked by SHAP were visualized to highlight key interactions. Results: There were 4,113 CVD events in the Toyama cohort. The Transformer achieved the best internal performance: C-index 0.796 (95% confidence interval [CI]: 0.790–0.802), 10-year AUC 0.821 (CI: 0.817–0.828), and PR-AUC 0.465 (CI: 0.456–0.475). In the Kanazawa cohort, performance remained strong (C-index 0.743; AUC 0.775; PR-AUC 0.504). SHAP identified age, electrocardiogram (ECG), antihypertensive medication, and sex as key predictors. FAN highlighted interpretable relationships—for example, weight gain shaped the model’s interpretation of age-related risk. Age was the most connected node in the attention network, linking behavioral and physiological features. Conclusion: The Transformer-based model outperformed conventional methods in both discrimination and calibration for long-term CVD risk prediction. Its consistent performance across distinct populations supports its utility in community-level risk stratification. By combining SHAP and FAN, the model reveals how modifiable behaviors influence physiological risk, supporting personalized prevention and public health strategies.
- Research Article
- 10.3389/fpubh.2025.1659987
- Sep 18, 2025
- Frontiers in Public Health
BackgroundNegative emotionality is a core dimension of infant temperament, characterized by heightened distress, reactivity, and difficulty with self-regulation. It has been consistently associated with later behavioral and emotional difficulties. Emerging evidence suggests that maternal mental health (MMH) in the postpartum period may influence infant temperament. However, few studies have applied machine learning (ML) methods to examine the predictive capacity of MMH profiles for early infant emotional development.ObjectivesThis study aimed to investigate whether postpartum maternal depression, anxiety, and birth-related trauma, along with sociodemographic factors, can predict infant negative emotionality during the first year postpartum using tabular ML models.MethodsData were obtained from 410 mother–infant dyads. Infant temperament was assessed using the Negative Emotionality subscale of the Infant Behavior Questionnaire-Revised (IBQ-R). MMH symptoms were measured via the Edinburgh Postnatal Depression Scale (EPDS), the Hospital Anxiety and Depression Scale (HADS), and the City Birth Trauma Scale (City BiTS). Six tabular ML models were trained using MMH and demographic features: Tabular Prior-Data Fitted Network (TabPFN), Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Random Forest, and Support Vector Machine (SVM). Performance was evaluated using Receiver Operating Characteristic Area Under The Curve (ROC-AUC), Precision-Recall Area Under the Curve (PR-AUC), F1-score, sensitivity, and specificity.ResultsPostpartum MMH symptoms and maternal–infant characteristics moderately predicted infant negative emotionality. LightGBM achieved the highest performance across ROC-AUC (0.76), F1-score (0.72), sensitivity (0.71), and specificity (0.73). TabPFN yielded the highest PR-AUC (0.78). Key predictors included gestational age, infant's age, EPDS score, mother's age, HADS score, and City BiTS score.ConclusionsThese findings highlight the potential of ML tools in early identification of at-risk infants and the importance of integrating MMH screening into postnatal care. Such predictive insights can inform timely, personalized interventions that address the unique emotional needs of both mother and infant, ultimately fostering healthier developmental trajectories and enhancing overall family well being.
- Research Article
2
- 10.2196/66733
- May 26, 2025
- Journal of Medical Internet Research
BackgroundSepsis-associated liver injury (SALI) is a severe complication of sepsis that contributes to increased mortality and morbidity. Early identification of SALI can improve patient outcomes; however, sepsis heterogeneity makes timely diagnosis challenging. Traditional diagnostic tools are often limited, and machine learning techniques offer promising solutions for predicting adverse outcomes in patients with sepsis.ObjectiveThis study aims to develop an explainable machine learning model, incorporating stacking techniques, to predict the occurrence of liver injury in patients with sepsis and provide decision support for early intervention and personalized treatment strategies.MethodsThis retrospective multicenter cohort study adhered to the TRIPOD+AI (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, Extended for Artificial Intelligence) guidelines. Data from 8834 patients with sepsis in the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used for training and internal validation, while data from 4236 patients in the eICU-Collaborative Research Database (eICU-CRD) database were used for external validation. SALI was defined as an international normalized ratio >1.5 and total bilirubin >2 mg/dL within 1 week of intensive care unit admission. Nine machine learning models—decision tree, random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), support vector machine, elastic net, logistic regression, multilayer perceptron, and k-nearest neighbors—were trained. A stacking ensemble model, using LightGBM, XGBoost, and RF as base learners and Lasso regression as the meta-model, was optimized via 10-fold cross-validation. Hyperparameters were tuned using grid search and Bayesian optimization. Model performance was evaluated using accuracy, balanced accuracy, Brier score, detection prevalence, F1-score, Jaccard index, κ coefficient, Matthews correlation coefficient, negative predictive value, positive predictive value, precision, recall, area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC, and decision curve analysis. Shapley additive explanations (SHAP) values were used to quantify feature importance.ResultsIn the training set, LightGBM, XGBoost, and RF demonstrated the best performance among all models, with ROC-AUCs of 0.9977, 0.9311, and 0.9847, respectively. These models exhibited minimal variance in cross-validation, with tightly clustered ROC-AUC and precision-recall area under the curve distributions. In the internal validation set, LightGBM (ROC-AUC 0.8401) and XGBoost (ROC-AUC 0.8403) outperformed all other models, while RF achieved an ROC-AUC of 0.8193. In the external validation set, LightGBM (ROC-AUC 0.7077), XGBoost (ROC-AUC 0.7169), and RF (ROC-AUC 0.7081) maintained strong performance, although with slight decreases in ROC-AUC compared with the training set. The stacking model achieved ROC-AUCs of 0.995, 0.838, and 0.721 in the training, internal validation, and external validation sets, respectively. Key predictors—total bilirubin, lactate, prothrombin time, and mechanical ventilation status—were consistently identified across models, with SHAP analysis highlighting their significant contributions to the model’s predictions.ConclusionsThe stacking ensemble model developed in this study yields accurate and robust predictions of SALI in patients with sepsis, demonstrating potential clinical utility for early intervention and personalized treatment strategies.
- Research Article
8
- 10.3389/fpsyt.2024.1376784
- Apr 16, 2024
- Frontiers in Psychiatry
The COVID-19 pandemic has exacerbated mental health challenges, particularly depression among college students. Detecting at-risk students early is crucial but remains challenging, particularly in developing countries. Utilizing data-driven predictive models presents a viable solution to address this pressing need. 1) To develop and compare machine learning (ML) models for predicting depression in Argentinean students during the pandemic. 2) To assess the performance of classification and regression models using appropriate metrics. 3) To identify key features driving depression prediction. A longitudinal dataset (N = 1492 college students) captured T1 and T2 measurements during the Argentinean COVID-19 quarantine. ML models, including linear logistic regression classifiers/ridge regression (LogReg/RR), random forest classifiers/regressors, and support vector machines/regressors (SVM/SVR), are employed. Assessed features encompass depression and anxiety scores (at T1), mental disorder/suicidal behavior history, quarantine sub-period information, sex, and age. For classification, models' performance on test data is evaluated using Area Under the Precision-Recall Curve (AUPRC), Area Under the Receiver Operating Characteristic curve, Balanced Accuracy, F1 score, and Brier loss. For regression, R-squared (R2), Mean Absolute Error, and Mean Squared Error are assessed. Univariate analyses are conducted to assess the predictive strength of each individual feature with respect to the target variable. The performance of multi- vs univariate models is compared using the mean AUPRC score for classifiers and the R2 score for regressors. The highest performance is achieved by SVM and LogReg (e.g., AUPRC: 0.76, 95% CI: 0.69, 0.81) and SVR and RR models (e.g., R2 for SVR and RR: 0.56, 95% CI: 0.45, 0.64 and 0.45, 0.63, respectively). Univariate models, particularly LogReg and SVM using depression (AUPRC: 0.72, 95% CI: 0.64, 0.79) or anxiety scores (AUPRC: 0.71, 95% CI: 0.64, 0.78) and RR using depression scores (R2: 0.48, 95% CI: 0.39, 0.57) exhibit performance levels close to those of the multivariate models, which include all features. These findings highlight the relevance of pre-existing depression and anxiety conditions in predicting depression during quarantine, underscoring their comorbidity. ML models, particularly SVM/SVR and LogReg/RR, demonstrate potential in the timely detection of at-risk students. However, further studies are needed before clinical implementation.
- Research Article
- 10.3390/sym17050794
- May 20, 2025
- Symmetry
Electrocardiogram (ECG) interpretation using deep learning models holds immense potential for improving cardiac diagnosis. However, existing models often suffer from overconfident predictions and lack the capability to directly quantify uncertainty, leading to unreliable clinical guidance. To address this challenge, we propose a model for uncertainty-aware ECG interpretation. The model employs a deep convolutional architecture with max-pooling residual modules to capture both local and global spatiotemporal features from raw ECG signals. The architectural design respects the symmetry inherent in ECG waveforms—such as periodicity and morphological consistency across cardiac cycles—enabling the network to extract clinically relevant features more effectively. Then, unlike conventional models that rely on softmax-based probability outputs, our approach parameterizes class distributions using the Dirichlet distribution, while Subjective Logic translates these parameters into interpretable belief masses and uncertainty scores. We evaluate the model on the PhysioNet Challenge 2017 dataset, our model achieves an accuracy of 86.12%, an F1 score of 83.14%, a Precision-Recall Area Under the Curve (PR-AUC) of 85.25%, and a Receiver Operating Characteristic Area Under the Curve (ROC-AUC) of 92.87%—outperforming baseline models in three out of four metrics. Critically, the model reduces overconfidence to 0.59% (compared to 12–22% in softmax-based baselines), aligning prediction confidence with true accuracy. By progressively increasing the uncertainty threshold u, the model dynamically filters low-confidence predictions, leading to consistently improved performance—reaching up to 93.59% accuracy, 93.22% F1 score, 89.17% PR-AUC, and 95.10% ROC-AUC at u = 0.1. These results validate the model’s capacity for reliable ECG interpretation while leveraging physiological signal symmetry for enhanced feature extraction.
- Front Matter
2
- 10.1016/j.jtcvs.2021.08.009
- Aug 8, 2021
- The Journal of Thoracic and Cardiovascular Surgery
Commentary: To classify means to choose a threshold
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.