A machine-learning-based osteoporosis screening tool integrating the Shapley Additive exPlanation (SHAP) method: model development and validation study.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Existing osteoporosis screening tools are inaccurate and inconvenient, prompting the need for a better alternative. A machine learning tool (Gradient Boosting) with key factors (weight, age, height) outperformed OST (AUC 0.828 vs 0.781, p < 0.0001) in validation. The validated, clinically applicable tool improves osteoporosis screening accessibility and accuracy. As the first "line of defence" for osteoporosis detection, existing screening tools have low accuracy and are inconvenient to use. Therefore, this study aims to develop a machine-learning-based, clinically applicable, and interpretable osteoporosis screening tool. This study included 9405 American participants aged 50years and older (with the average age of the osteoporosis population in the training set and test set being 72 ± 9years and 73 ± 8years, respectively). The study selected 13 clinically accessible indicators as candidate predictive variables, divided the data into a training set and a test set at a ratio of 7:3, used the Lasso for feature selection, compared six statistical and machine learning models, evaluated model performance through metrics such as the Area Under the Receiver Operating Characteristic Curve (AUC), Sensitivity, specificity, F1-score, decision curve, calibration curve, and clinical impact curve, employed the SHAP (Shapley Additive exPlanations) method to enhance model interpretability, and conducted external validation based on an independent dataset from the Second Hospital of Lanzhou University. "Weight," "age," and "height" are the most critical predictive factors. Gradient Boosting Machine (GB) showed optimal results, with training and test set AUC (0.850, 0.841), sensitivity (0.757, 0.737), specificity (0.793, 0.779), and F1-score (0.336, 0.316), respectively. External validation (3500 subjects) showed that the GB-based screening tool had an AUC of 0.828, which was significantly higher than that of the traditional Osteoporosis Self-Assessment Tool (OST, AUC = 0.781) via the DeLong test (z = 10.880, p < 0.0001). A clinically applicable osteoporosis screening tool based on machine learning algorithms was developed and validated.

Similar Papers
  • Research Article
  • 10.3389/fneur.2025.1608264
Tau protein mediates the association between frailty and postoperative delirium: a machine learning model incorporating cerebrospinal fluid biomarkers
  • Sep 17, 2025
  • Frontiers in Neurology
  • Yizhi Liang + 12 more

ObjectivePostoperative delirium (POD) is a prevalent neurological complication linked to adverse clinical outcomes. The underlying mechanisms of POD remain unclear. This study aimed to investigate the association between POD and frailty and determine whether frailty influences POD incidence. Furthermore, machine learning algorithms were utilized to identify key predictors of POD in patients undergoing hip or knee replacement.MethodsA total of 625 Han Chinese patients were recruited between September 2021 and May 2023. Preoperative frailty was assessed using the Frailty Scale and Frailty Phenotype criteria. The Mini-Mental State Examination (MMSE) evaluated preoperative cognitive function, while the Confusion Assessment Method (CAM) diagnosed POD. The severity of POD was additionally quantified using the Memorial Delirium Assessment Scale (MDAS). Receiver Operating Characteristic (ROC) curve analysis explored the association between preoperative frailty and POD, and the mediating effect of cerebrospinal fluid (CSF) biomarkers was analyzed. Ten machine learning algorithms—including Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Artificial Neural Network (ANN), Random Forest (RF), XGBoost, K-Nearest Neighbors (KNN), AdaBoost, LightGBM, and CatBoost—were implemented to develop predictive models. The dataset was randomly split into training (70%) and testing (30%) subsets. Ten-fold cross-validation was incorporated during model training and validation to mitigate overfitting and enhance generalizability. Model performance was evaluated using multiple metrics, such as accuracy, sensitivity, specificity, precision, Brier score, area under the ROC curve (AUC), and F1 score. Furthermore, graphical analyses—including calibration curves, decision diagrams, clinical impact curves, and confusion matrices—were applied to assess model robustness and clinical utility. Finally, SHAP (Shapley Additive Explanations) analysis elucidated the model’s decision-making process, emphasizing the pivotal role of preoperative frailty in POD prediction.ResultsThe incidence of POD was 14.7%. The study identified frailty, Tau, and P-tau as significant risk factors for POD (OR = 67.229, 95% CI: 34.649–130.444, p < 0.001; OR = 1.020, 95% CI: 1.016–1.024, p < 0.001; OR = 1.018, 95% CI: 1.010–1.027, p < 0.001). ROC curve analysis (AUC = 0.983) demonstrated that combining frailty with CSF biomarkers had strong predictive power for distinguishing POD. The direct effect of frailty on POD was 0.504878, the total effect was 0.6547619, and the mediating effect of Tau accounted for 22.89%. Using Lasso regression for variable selection, we subsequently identified eight predictors—frailty, Tau, Aβ42/Tau, Aβ40, age, Aβ42, P-tau, and drinking history—from the training set via logistic regression. Based on these factors, we constructed 10 machine learning models. Among all machine learning algorithms, GBM performed the best, achieving an AUC of 0.973 (95% CI, 0.973–1.000) in the test set. Furthermore, SHAP analysis confirmed that frailty and Tau were the key determinants influencing the machine learning model’s predictions.ConclusionPreoperative frailty is an independent risk factor for POD. A machine learning model for predicting POD in patients undergoing hip or knee replacement was developed, with GBM demonstrating superior performance among all models. The GBM-based model enabled early identification of patients at high risk of delirium.

  • Research Article
  • 10.3389/fmed.2025.1553274
A risk prediction model for poor joint function recovery after ankle fracture surgery based on interpretable machine learning
  • Jun 26, 2025
  • Frontiers in Medicine
  • Congyang Li + 6 more

ObjectiveCurrently, there is no individualized prediction model for joint function recovery after ankle fracture surgery. This study aims to develop a prediction model for poor recovery following ankle fracture surgery using various machine learning algorithms to facilitate early identification of high-risk patients.MethodsA total of 750 patients who underwent ankle fracture surgery at Lu’an Hospital Affiliated to Anhui Medical University between January 2018 and December 2023 were followed up. The collected data were chronologically divided into a training set (599 cases) and a test set (151 cases). Feature variables were selected using the Boruta algorithm, and five machine learning algorithms (logistic regression, random forest, extreme gradient boosting, support vector machine, and lasso-stacking) were employed to construct models. The performance of these models was compared on both the training and test sets to select the best-performing model. The decision basis of the optimal model was further analyzed using Shapley Additive Explanation (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME).ResultsIn total, 12 characteristic variables were identified using the Boruta algorithm. Among the five machine learning models, random forest model: AUC (training set: 0.840, test set: 0.779), accuracy (training set: 0.781, test set: 0.742); SVM: AUC (training set: 0.809, test set: 0.768), accuracy (training set: 0.751, test set: 0.728); XGBoost: AUC (training set: 0.734, test set: 0.748), accuracy (training set: 0.668, test set: 0.722); logistic regression: AUC (training set: 0.672, test set: 0.691), accuracy (training set: 0.651, test set: 0.656); lasso-stacking model: AUC (training set: 0.877, test set: 0.791), accuracy (training set: 0.796, test set: 0.762). The PR curve and decision curve of the lasso-stacking model were better than those of other models. The lasso-stacking model had the best performance. SHAP analysis showed that functional exercise compliance, combined ligament injury, and open fracture accounted for the largest proportion of SHAP values and were the most important influencing factors.ConclusionThrough evaluation and comparison of the developed models, the lasso-stacking model demonstrated the best performance and is more suitable for predicting joint function recovery after ankle surgery. This model can be further validated externally and applied in clinical practice.

  • Research Article
  • Cite Count Icon 8
  • 10.1097/txd.0000000000001212
Machine Learning Prediction of Liver Allograft Utilization From Deceased Organ Donors Using the National Donor Management Goals Registry.
  • Sep 27, 2021
  • Transplantation Direct
  • Andrew M Bishara + 9 more

Several machine learning classifiers were trained to predict transplantation of a liver graft. We utilized 127 variables available in the DMG dataset. We included data from potential deceased organ donors between April 2012 and January 2019. The outcome was defined as liver recovery for transplantation in the operating room. The prediction was made based on data available 12-18 h after the time of authorization for transplantation. The data were randomly separated into training (60%), validation (20%), and test sets (20%). We compared the performance of our models to the Liver Discard Risk Index. Of 13 629 donors in the dataset, 9255 (68%) livers were recovered and transplanted, 1519 recovered but used for research or discarded, 2855 were not recovered. The optimized gradient boosting machine classifier achieved an area under the curve of the receiver operator characteristic of 0.84 on the test set, outperforming all other classifiers. This model predicts successful liver recovery for transplantation in the operating room, using data available early during donor management. It performs favorably when compared to existing models. It may provide real-time decision support during organ donor management and transplant logistics.

  • Research Article
  • 10.3389/fonc.2025.1683164
Machine learning model for predicting epidermal growth factor receptor expression status in breast cancer using ultrasound radiomics
  • Oct 17, 2025
  • Frontiers in Oncology
  • Zhirong Xu + 7 more

Background/objectivesThe epidermal growth factor receptor (EGFR) is a clinically important target, as its expression in patients with breast cancer influences both overall and disease-free survival. Current methods for assessing EGFR expression status in a patient are invasive. Therefore, in this study, we developed a machine learning-based approach utilizing ultrasound radiomics to non-invasively predict EGFR expression status in patients with breast cancer.MethodsRadiomic features were extracted from grayscale and wavelet-transformed ultrasound images of 321 patients. The dataset was randomly split into training (n = 225) and test (n = 96) sets at a 7:3 ratio with stratified sampling to preserve the EGFR+/– ratio. Key predictors were identified using a multi-step procedure—including reproducibility filtering (ICC > 0.75), univariate F-test filtering (p < 0.05), and L1-regularized selection via LASSO regression. Seven machine-learning models were trained. Model interpretability was assessed using SHAP (Shapley Additive Explanations). In addition to the hold-out evaluation, we performed stratified 10-fold cross-validation to reduce selection bias.ResultsThe random forest model demonstrated the optimal performance, with an area under the receiver operating characteristic curve of 0.86 in the training set and 0.70 in the test set. It significantly outperformed the other models (P < 0.001). The Shapley additive explanation method was used to interpret the model, revealing that original_ngtdm_Coarseness, original_ngtdm_Strength, and wavelet.LL_glcm_ClusterProminence were the top predictors. These features reflect structural compactness and heterogeneity associated with EGFR overexpression.ConclusionsWe present a reliable and interpretable tool for non-invasively assessing EGFR expression status in patients with breast cancer. The most important predictors captured tumor heterogeneity and microstructural uniformity, highlighting the biological relevance of radiomic patterns in EGFR-positive tumors. This model integrates advanced imaging analyses with machine learning, underscoring the potential of radiomics to advance precision oncology.

  • Research Article
  • 10.21037/jtd-2025-310
Machine learning algorithms for predicting malignancy grades of lung adenocarcinoma and guiding treatments: CT radiomics-based comparisons.
  • Apr 1, 2025
  • Journal of thoracic disease
  • Jun Zhu + 8 more

Lung adenocarcinoma (LUAD) is the most frequently diagnosed subtype of non-small cell lung cancer (NSCLC). Notably, prognosis can vary significantly among LUAD patients with different tumor subtypes. The advent of radiomics and machine learning (ML) technologies enables the development of non-invasive pathology predictive models. We attempted to develop computed tomography (CT) radiomics-based diagnostic models, enhanced by ML, to predict LUAD malignancy grade and guide surgical strategies. In this retrospective analysis, a total of 168 surgical patients with histology-confirmed LUAD were divided into low-risk group (n=93) and intermediate-to-high-risk group (n=75) based on postoperative pathology. The region of interest (ROI) was delineated on the preoperative CT images for all patients, followed by the extraction of radiomic features. Patients were randomly allocated to a training set (n=117) and a testing set (n=51) in a 7:3 ratio. Within the training set, clinical-radiological model (CM) and radiomics model (RM) were developed utilizing patients' clinical characteristics, radiological semantic features, and radiomic features, along with the calculation of Rad scores. After the Rad scores were combined with independent risk factors among clinical-radiological features, logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), K-nearest neighbors (KNN), and naïve Bayes model (NBM) were employed to create different comprehensive models (COMs). The optimal model was identified based on the receiver operating characteristic (ROC) curves and the DeLong test. Finally, Shapley additive explanations (SHAP) were utilized to visualize the predictive processes of the models. Among the 168 patients enrolled, there were 50 males (29.76%) aged 56 (49.25, 67.00) years and 118 females (70.24%) aged 56.5 (42.00, 64.00) years; Diameter (P<0.001), and consolidation-to-tumor ratio (CTR) ≥0.5 (P=0.002) were identified as independent risk factors for the malignancy degree of LUAD during CM creation. The CM had an area under the ROC curve (AUC) of 0.909 [95% confidence interval (CI): 0.856-0.962] in the training set and 0.920 (95% CI: 0.846-0.994) in the testing set. The RM, comprising seven radiomic features, achieved an AUC of 0.961 (95% CI: 0.926-0.996) in the training set and 0.957 (95% CI: 0.905-1.000) in the testing set. Among models created using various ML algorithms, the XGBoost model was identified as the optimal model. SHAP visualization revealed the model prediction processes and the values of different features. We constructed and validated a robust, integrative model leveraging ML and CT radiomics, which amalgamates radiomics, clinical, and radiological attributes to precisely identify LUADs with elevated postoperative pathological grades. This enables doctors to formulate different surgical plans according to the pathology of the patients' tumors before the operation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.1186/s12911-023-02166-8
The prediction of distant metastasis risk for male breast cancer patients based on an interpretable machine learning model
  • Apr 21, 2023
  • BMC Medical Informatics and Decision Making
  • Xuhai Zhao + 1 more

ObjectivesThis research was designed to compare the ability of different machine learning (ML) models and nomogram to predict distant metastasis in male breast cancer (MBC) patients and to interpret the optimal ML model by SHapley Additive exPlanations (SHAP) framework.MethodsFour powerful ML models were developed using data from male breast cancer (MBC) patients in the SEER database between 2010 and 2015 and MBC patients from our hospital between 2010 and 2020. The area under curve (AUC) and Brier score were used to assess the capacity of different models. The Delong test was applied to compare the performance of the models. Univariable and multivariable analysis were conducted using logistic regression.ResultsOf 2351 patients were analyzed; 168 (7.1%) had distant metastasis (M1); 117 (5.0%) had bone metastasis, and 71 (3.0%) had lung metastasis. The median age at diagnosis is 68.0 years old. Most patients did not receive radiotherapy (1723, 73.3%) or chemotherapy (1447, 61.5%). The XGB model was the best ML model for predicting M1 in MBC patients. It showed the largest AUC value in the tenfold cross validation (AUC:0.884; SD:0.02), training (AUC:0.907; 95% CI: 0.899—0.917), testing (AUC:0.827; 95% CI: 0.802—0.857) and external validation (AUC:0.754; 95% CI: 0.739—0.771) sets. It also showed powerful ability in the prediction of bone metastasis (AUC: 0.880, 95% CI: 0.856—0.903 in the training set; AUC: 0.823, 95% CI:0.790—0.848 in the test set; AUC: 0.747, 95% CI: 0.727—0.764 in the external validation set) and lung metastasis (AUC: 0.906, 95% CI: 0.877—0.928 in training set; AUC: 0.859, 95% CI: 0.816—0.891 in the test set; AUC: 0.756, 95% CI: 0.732—0.777 in the external validation set). The AUC value of the XGB model was larger than that of nomogram in the training (0.907 vs 0.802) and external validation (0.754 vs 0.706) sets.ConclusionsThe XGB model is a better predictor of distant metastasis among MBC patients than other ML models and nomogram; furthermore, the XGB model is a powerful model for predicting bone and lung metastasis. Combining with SHAP values, it could help doctors intuitively understand the impact of each variable on outcome.

  • Research Article
  • Cite Count Icon 1
  • 10.2196/66723
Development of a Longitudinal Model for Disability Prediction in Older Adults in China: Analysis of CHARLS Data (2015-2020)
  • Apr 17, 2025
  • JMIR Aging
  • Jingjing Chu + 4 more

BackgroundDisability profoundly affects older adults’ quality of life and imposes considerable burdens on health care systems in China’s aging society. Timely predictive models are essential for early intervention.ObjectiveWe aimed to build effective predictive models of disability for early intervention and management in older adults in China, integrating physical, cognitive, physiological, and psychological factors.MethodsData from the China Health and Retirement Longitudinal Study (CHARLS), spanning from 2015 to 2020 and involving 2450 older individuals initially in good health, were analyzed. The dataset was randomly divided into a training set with 70% data and a testing set with 30% data. LASSO regression with 10-fold cross-validation identified key predictors, which were then used to develop an Extreme Gradient Boosting (XGBoost) model. Model performance was evaluated using receiever operating characteristic curves, calibration curves, and clinical decision and impact curves. Variable contributions were interpreted using SHapley Additive exPlanations (SHAP) values.ResultsLASSO regression was used to evaluate 36 potential predictors, resulting in a model incorporating 9 key variables: age, hand grip strength, standing balance, the 5-repetition chair stand test (CS-5), pain, depression, cognition, respiratory function, and comorbidities. The XGBoost model demonstrated an area under the curve of 0.846 (95% CI 0.825‐0.866) for the training set and 0.698 (95% CI 0.654‐0.743) for the testing set. Calibration curves demonstrated reliable predictive accuracy, with mean absolute errors of 0.001 and 0.011 for the training and testing sets, respectively. Clinical decision and impact curves demonstrated significant utility across risk thresholds. SHAP analysis identified pain, respiratory function, and age as top predictors, highlighting their substantial roles in disability risk. Hand grip and the CS-5 also significantly influenced the model. A web-based application was developed for personalized risk assessment and decision-making.ConclusionA reliable predictive model for 5-year disability risk in Chinese older adults was developed and validated. This model enables the identification of high-risk individuals, supports early interventions, and optimizes resource allocation. Future efforts will focus on updating the model with new CHARLS data and validating it with external datasets.

  • Research Article
  • 10.7717/peerj.16867
Establishment and validation of a heart failure risk prediction model for elderly patients after coronary rotational atherectomy based on machine learning.
  • Jan 31, 2024
  • PeerJ
  • Lixiang Zhang + 2 more

To develop and validate a heart failure risk prediction model for elderly patients after coronary rotational atherectomy based on machine learning methods. A retrospective cohort study was conducted to select 303 elderly patients with severe coronary calcification as the study subjects. According to the occurrence of postoperative heart failure, the study subjects were divided into the heart failure group (n = 53) and the non-heart failure group (n = 250). Retrospective collection of clinical data from the study subjects during hospitalization. After processing the missing values in the original data and addressing sample imbalance using Adaptive Synthetic Sampling (ADASYN) method, the final dataset consists of 502 samples: 250 negative samples (i.e., patients not suffering from heart failure) and 252 positive samples (i.e., patients with heart failure). According to a 7:3 ratio, the datasets of 502 patients were randomly divided into a training set (n = 351) and a validation set (n = 151). On the training set, logistic regression (LR), extreme gradient boosting (XGBoost), support vector machine (SVM), and lightweight gradient boosting machine (LightGBM) algorithms were used to construct heart failure risk prediction models; Evaluate model performance on the validation set by calculating the area under the receiver operating characteristic curve (ROC) curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, F1-score, and prediction accuracy. A total of 17.49% of 303 patients occured postoperative heart failure. The AUC of LR, XGBoost, SVM, and LightGBM models in the training set were 0.872, 1.000, 0.699, and 1.000, respectively. After 10 fold cross validation, the AUC was 0.863, 0.972, 0.696, and 0.963 in the training set, respectively. Among them, XGBoost had the highest AUC and better predictive performance, while SVM models had the worst performance. The XGBoost model also showed good predictive performance in the validation set (AUC = 0.972, 95% CI [0.951-0.994]). The Shapley additive explanation (SHAP) method suggested that the six characteristic variables of blood cholesterol, serum creatinine, fasting blood glucose, age, triglyceride and NT-proBNP were important positive factors for the occurrence of heart failure, and LVEF was important negative factors for the occurrence of heart failure. The seven characteristic variables of blood cholesterol, blood creatinine, fasting blood glucose, NT-proBNP, age, triglyceride and LVEF are all important factors affecting the occurrence of heart failure. The prediction model of heart failure risk for elderly patients after CRA based on the XGBoost algorithm is superior to SVM, LightGBM and the traditional LR model. This model could be used to assist clinical decision-making and improve the adverse outcomes of patients after CRA.

  • Research Article
  • 10.1186/s13023-025-04045-z
Developing an explainable machine learning model to predict false-negative citrin deficiency cases in newborn screening
  • Oct 8, 2025
  • Orphanet Journal of Rare Diseases
  • Peiyao Wang + 9 more

BackgroundNeonatal Intrahepatic Cholestasis caused by Citrin Deficiency (NICCD) is an autosomal recessive disorder affecting the urea cycle and energy metabolism. Newborn screening (NBS) usually relies on elevated citrulline, but some patients have normal citrulline, resulting in false negatives and delayed diagnosis. This study develops an explainable machine learning (ML) model to predict false-negative NICCD cases during NBS.MethodsData from 53 false-negative NICCD patients and 212 controls, collected retrospectively between 2011 and 2024, were analyzed. The dataset was split into a training set (70%) and a test set (30%). External validation involved 48 participants from distinct time periods. Key predictors were identified using variable importance in projection (VIP > 1) and Lasso regression. Six ML models were trained for evaluation: Logistic Regression, Random Forest, Light Gradient Boosting Machine, Extreme Gradient Boosting (XGBoost), K-Nearest Neighbor, and Support Vector Machines. Performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1 score. Shapley Additive exPlanations (SHAP) was applied to determine the importance of features and interpret the models.ResultsBirth weight, citrulline, glycine, phenylalanine, ornithine, arginine, proline, succinylacetone, and C10:2 were selected as predictive features. Among the ML models, XGBoost demonstrated the most robust and consistent performance, achieving AUCs of 0.971(95%CI: 0.959–0.979), 0.968, and 0.977, and F1 scores of 0.786(95% CI: 0.744–0.820), 0.828, and 0.833 in the training, test, and external validation sets, respectively. SHAP analysis showed that the most important features are citrulline, glycine, phenylalanine, succinylacetone, birth weight, and ornithine. Feature pairs such as citrulline-phenylalanine, citrulline-glycine, succinylacetone-birth weight, and ornithine-glycine showed varying interactions. SHAP force plots, decision plots, and waterfall plots provided insightful patient-level interpretations. Finally, we built a network calculator for the prediction of false-negative NICCD cases (https://myapp123.shinyapps.io/my_shiny_app/).ConclusionAn interpretable machine learning model utilizing metabolite and demographic data enhances the detection of false-negative NICCD cases, facilitates early identification and intervention, and ultimately improves the overall effectiveness of the newborn screening system.Supplementary InformationThe online version contains supplementary material available at 10.1186/s13023-025-04045-z.

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s12888-024-06384-w
Auxiliary identification of depression patients using interpretable machine learning models based on heart rate variability: a retrospective study
  • Dec 18, 2024
  • BMC Psychiatry
  • Min Yang + 5 more

ObjectiveDepression has emerged as a global public health concern with high incidence and disability rates, which are timely imperative to identify and intervene in clinical practice. The objective of this study was to explore the association between heart rate variability (HRV) and depression, with the aim of establishing and validating machine learning models for the auxiliary diagnosis of depression.MethodsThe data of 465 outpatients from the Affiliated Hospital of Southwest Medical University were selected for the study. The study population was then randomly divided into training and test sets in a 7:3 ratio. Logistic regression (LR), support vector machine (SVM), random forest (RF) and eXtreme gradient boosting (XGBoost) algorithm models were used to construct risk prediction models in the training set, and the model performance was verified in the test set. The four models were evaluated by the area under the receiver operating characteristic curve (ROC), calibration curve and the decision curve analysis (DCA). Furthermore, we employed the SHapley Additive exPlanations (SHAP) method to illustrate the effects of the features attributed to the model.ResultsThere were 237 people in the depressed group and 228 in the non-depressed group. In the training set (n = 325) and test set (n = 140), the area under of the curve(AUC) values of the XGBoost model are 0.92 [95% confidence interval (CI) 0.888,0.95] and 0.82 (95% CI 0.754,0.892)] respectively, which are higher than the other three models. The XGBoost model has excellent predictive efficacy and clinical utility. The SHAP method was ranked according to the importance of the degree of influence on the model, with age, heart rate, Standard deviation of the NN intervals (SDNN), two nonlinear parameters of HRV and sex considered to be the top 6 predictors.ConclusionWe provided a feasibility study of HRV as a potential biomarker for depression. The proposed model based on HRV provides clinicians with a quantitative auxiliary diagnostic tool, which is assist to improving the accuracy and efficiency of depression diagnosis, and can also be utilized for the monitoring and prevention of depression.

  • Research Article
  • 10.21037/qims-24-1073
Time-variant and tissue-level collaterals predict postoperative neurological recovery and clinical outcomes of patients with endovascular thrombectomy.
  • May 1, 2025
  • Quantitative imaging in medicine and surgery
  • Song Liu + 14 more

A comprehensive assessment of collateral status can yield profound insights into the ischemic mechanism in patients experiencing acute ischemic stroke. This study aims to investigate whether time-variant and tissue-level collateral characteristics may serve as predictors for functional outcomes in patients undergoing endovascular thrombectomy (EVT) through the application of machine learning (ML) algorithms, and to stratify postoperative neurological recovery of these patients. In this retrospective study, 128 acute ischemic stroke patients characterized by anterior large-vessel occlusion and received EVT between May 2020 and December 2022 were enrolled. These patients underwent multiphase computed tomography (CT) angiography (mCTA) and CT perfusion (CTP). The time-variant collateral score was defined as the Collateral Score on Color-Coded summation maps (CSCC) of mCTA. The hypoperfusion intensity ratio (HIR) was calculated from CTP data. The data were split into training and test sets in a ratio of 7:3, and univariable and multivariable regression analyses were employed for feature selection. For ML analyses, logistic regression (LR), support vector machine (SVM), random forest (RF), decision tree (DT), and eXtreme gradient boosting (XGBoost) algorithms were utilized. The receiver operating characteristic (ROC) curve and decision curve were employed for performance evaluation. The mixed effect model was established to estimate the impact of collateral stratification on the postoperative National Institutes of Health Stroke Scale (NIHSS). Age [odds ratio (OR) =1.073; 95% confidence interval (CI): 1.008, 1.154; P=0.040], Alberta Stroke Program Early CT Score (ASPECTS) (OR =0.742; 95% CI: 0.546, 0.975; P=0.040), CSCC (OR =0.468; 95% CI: 0.213, 0.953; P=0.044), and HIR (OR =56.666; 95% CI: 3.843, 1,156.959; P=0.005) were significantly associated with good outcome in training set. By utilizing these four selected features, the RF algorithm achieved the best performance and the highest clinical suitability in predicting good clinical outcomes, with an area under the ROC curve (AUC) of 0.964 (95% CI: 0.902, 0.992) and 0.837 (95% CI: 0.684, 0.935) in training set and testing set, respectively. The Shapley Additive exPlanations (SHAP) analysis revealed that HIR was the most significant variable in predicting clinical outcomes. Fixed effects and group × time interaction effects [all P<0.01 at all time points (TPs)] were acquired in HIR stratification. HIR enabled better stratification and prediction of patients' postoperative NIHSS [Akaike information criterion (AIC): HIR =4,599.577 and CSCC =4,648.707]. RF model, which has been trained on time-variant and tissue-level collaterals, is capable of accurately predicting the clinical outcomes of patients undergoing EVT. Stratifying patients based on HIR may yield valuable insights into predicting trends in the potential postoperative neurological recovery.

  • Research Article
  • 10.1186/s12911-025-03082-9
Development and external validation of machine learning models for the early prediction of malnutrition in critically ill patients: a prospective observational study
  • Jul 3, 2025
  • BMC Medical Informatics and Decision Making
  • Yi Liu + 8 more

BackgroundEarly detection of malnutrition in critically ill patients is crucial for timely intervention and improved clinical outcomes. However, identifying individuals at risk remains challenging due to the complexity and variability of patient conditions. This study aimed to develop and externally validate machine learning models for predicting malnutrition within 24 h of intensive care unit (ICU) admission, culminating in a web-based malnutrition prediction tool for clinical decision support.MethodsA total of 1006 critically ill adult patients (aged ≥ 18 years) were included in the model development group, and 300 adult patients comprised the external validation group. The development data were partitioned into training (80%) and testing (20%) sets. Hyperparameters were optimized via 5-fold cross-validation on the training set, eliminating the need for a separate validation set while ensuring internal validation. External validation was performed on an independent group to assess generalizability. Predictors were selected using random forest recursive feature elimination; seven machine learning models—Extreme Gradient Boosting (XGBoost), random forest, decision tree, support vector machine (SVM), Gaussian naive Bayes, k-nearest neighbor (k-NN), and logistic regression—were trained and evaluated for accuracy, precision, recall, F1 score, Area Under the Receiver Operating Characteristic Curve (AUC-ROC), Area Under the Precision-Recall Curve (AUC-PR). Model interpretability was analyzed using SHapley Additive exPlanations (SHAP) to quantify feature contributions.ResultsIn the development phase, among 1006 patients, 34.0% had moderate malnutrition and 17.9% severe malnutrition. The XGBoost model achieved superior predictive accuracy with an accuracy of 0.90 (95% CI = 0.86–0.94), precision of 0.92 (95% CI = 0.88–0.95), recall of 0.92 (95% CI = 0.89–0.95), F1 score of 0.92 (95% CI = 0.89–0.95), AUC-ROC of 0.98 (95% CI = 0.96–0.99), and AUC-PR of 0.97 (95% CI = 0.95–0.99) on the testing set. External validation confirmed robust performance with an accuracy of 0.75 (95% CI: 0.70–0.79), precision of 0.79 (95% CI: 0.75–0.83), recall of 0.75 (95% CI: 0.70–0.79), F1 score of 0.74 (95% CI: 0.69–0.78), AUC-ROC of 0.88 (95% CI: 0.86–0.91), and AUC-PR of 0.77 (95% CI: 0.73–0.80).ConclusionsMachine learning models, particularly XGBoost, demonstrated promising performance in early malnutrition prediction in ICU settings. The resultant web-based tool offers valuable resource for clinical decision support.Trial registrationChinese Clinical Trial Registry ChiCTR2200058286 (https://www.chictr.org.cn/bin/project/edit? pid=248690). Registered 4th April 2022. Prospectively registered.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.ecoenv.2024.117210
Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology
  • Oct 23, 2024
  • Ecotoxicology and Environmental Safety
  • Qingan Fu + 7 more

Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology

  • Research Article
  • 10.3389/fonc.2025.1569729
Predicting recurrence risk in endometrial cancer: a multisequence MRI intratumoral and peritumoral radiomics nomogram approach.
  • May 6, 2025
  • Frontiers in oncology
  • Jie Li + 6 more

To assess the predictive value of a nomogram model incorporating clinical factors and multisequence MRI intratumoral and peritumoral radiomics features for estimating recurrence risk in endometrial cancer (EC) patients. This retrospective study included 184 patients with EC. The samples were randomly divided into a training set and a test set according to a 7:3 ratio, and intratumoral and peritumoral radiomics features were extracted from diffusion-weighted imaging (DWI) and T2-weighted imaging (T2WI) sequences. Optimal radiomics features were selected using the f-classification function, minimum redundancy maximum relevance (mRMR) method, and least absolute shrinkage and selection operator (Lasso). Nine machine learning classifiers were employed to construct the intratumoral model (RM1). The best-performing classifiers were then used to develop the intratumoral and peritumoral 2 mm radiomics model (RM2) and the intratumoral and peritumoral 4 mm radiomics model (RM3). The radiomics scores (Rad-score) from the top-performing radiomics model were combined with clinical factors to create the nomogram model (FM). The predictive performance of the FM model was evaluated using receiver operating characteristic (ROC) curve analysis, calibration curve assessment, clinical decision curve analysis (DCA), clinical impact curve (CIC), and the DeLong test. Feature importance analysis using the SHapley Additive exPlanations (SHAP) methodology. The logistic regression classifier (LR) showed optimal predictive efficacy, and RM2 demonstrated the best diagnostic performance. The clinical decision curve and DeLong test results indicated that the FM model was the optimal recurrence model in EC patients. A nomogram model integrating MRI radiomics features from intratumoral and peritumoral regions and clinical factors effectively predicts recurrence in EC patients.

  • Research Article
  • 10.1161/circ.152.suppl_3.4364966
Abstract 4364966: A Novel Machine Learning-based Adverse Cardiovascular Events Risk Algorithm For Cancer Patients Treated With Tyrosine Kinase Inhibitors
  • Nov 4, 2025
  • Circulation
  • Shawn Wahi + 4 more

Background: Cancer patients treated with tyrosine kinase inhibitors (TKIs) have an increased risk of adverse cardiovascular events (ACE). Traditional cardiovascular risk scores may not adequately capture TKI-associated cardiovascular toxicities or the unique features that contribute to ACE risk in this population. Recent studies have developed cardiovascular risk scores for cancer patients, achieving area under the receiver operating curve (AUC) values ranging from 0.65 to 0.85. Currently, there is no validated ACE risk algorithm designed specifically for TKI patients. Research question: Among cancer patients receiving TKIs, how well can a validated, interpretable machine learning-based algorithm predict risk of ACE? Methods: We analyzed 828 cancer patients treated with TKIs between 2020 and 2024 at a large academic center. Patient variables included demographics, comorbidities, lab values, cancer type, and imaging findings from echocardiography and cardiac MRI. The composite ACE outcome comprised myocardial infarction, coronary artery disease (CAD), arrhythmias, heart failure, valvular disease, atrioventricular block, and myocarditis. Data were partitioned into train (80%), test (10%), and holdout validation (10%) sets. An extreme gradient boosting (XGB) classifier was trained using 4-fold cross-validation on the train set, and performance was evaluated on the test set. Shapley Additive Explanation (SHAP) values were used to identify top predictive features. A multivariate logistic regression model was fit using selected features (based on SHAP values and clinical expertise) to form the final ACE risk score, which was then evaluated on the validation set. Results: ACE occurred in 37.8% of patients in our cohort. The XGB model achieved AUC 0.76 on the test set (Figure 1A) . Top SHAP features included age, sex, BMI, ejection fraction, hypertension, strain, metastasis, peripheral vascular disease, creatinine, hyperlipidemia, CAD, and chronic kidney disease (Figure 1C, Figure 2) . The final ACE risk algorithm trained on these features achieved 0.71 AUC, 71% accuracy, 0.73 precision, and 0.92 specificity on the holdout validation set (Figure 1B) . We integrate our ACE risk algorithm into a clinician-friendly online calculator (Figure 3) . Conclusion: We present a novel, interpretable, and clinically usable ACE risk score for cancer patients treated with TKIs, which may improve risk stratification and cardiovascular monitoring in this high-risk population.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon