Machine Learning Models Integrating Two-Dimensional Speckle Tracking Echocardiography and Clinical Variables for Diagnosis of Severe Coronary Artery Disease.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

To develop and validate machine learning (ML) models integrating two-dimensional speckle tracking echocardiography (2D-STE) parameters with clinical variables for robust identification of severe coronary artery disease (sCAD). In this retrospective cohort study, five distinct ML models (Random Forest [RF], Support Vector Machine [SVM], K-Nearest Neighbors [KNN], Multi-Layer Perceptron [MLP], and Extremely Randomized Trees [Extra Trees]) were constructed to identify sCAD on a cohort of 204 patients (80% training set, 20% independent test set). Within the independent test set, two junior sonographers' diagnostic performance for sCADwas compared first without and then with ML assistance over a 2-week interval. SHapley Additive exPlanations (SHAP) analysis was applied to visualize and interpret the models, identifying key features driving sCAD prediction accuracy, with results visualized through dependence diagrams and force plot. Furthermore, a clinical nomogram integrating key predictors identified by ML models was developed to enable individualized quantification of sCAD risk. Utilizing five features, the MLP demonstrated the best performance with an area under the curve (AUC) of 0.870 and a sensitivity of 0.944. The SHAP visualization analysis for this modelindicated that "LV AP4 Endo Peak L. Time SD" significantly influenced its predictions. The MLP model (AUC = 0.870) outperformed both junior sonographers (AUC = 0.687) and a nomogram constructed from ML-selected features (AUC = 0.712). Additionally, the results revealed that junior sonographers achieved significantly improved performance when assisted by the ML models. The developed ML models could differentiate patients with angiography-confirmed sCAD from those without. Importantly, these models significantly improved the diagnostic performance of junior sonographers when used as an assistive tool.

Similar Papers
  • Research Article
  • 10.1007/s10620-025-09646-z
Value of Endoscopic Ultrasonography for Distinguishing Malignant from Benign Non-pancreatic Periampullary Lesions: An Explainable Machine Learning Study.
  • Jan 9, 2026
  • Digestive diseases and sciences
  • Xue-Yong Zuo + 2 more

Early discrimination of non-pancreatic periampullary lesions (NPLs) is challenging owing to their complex anatomy and the absence of representative clinical symptoms. To establish an interpretable machine learning (ML) model that integrates clinical variables and endoscopic ultrasonography (EUS) features to diagnose NPLs. A total of 158 patients, suspected of having NPLs and who underwent EUS, were enrolled and randomly allocated into a training cohort (TC, n = 110) and a validation cohort (VC, n = 48). Risk clinical and EUS features were identified by multivariate logistic regression analysis and subsequently input into five ML classifiers to develop predictive models. The performance of ML models was assessed using the area under the curve (AUC), calibration curve, and decision curve analysis (DCA). The Shapley Additive Explanations (SHAP) approach was employed to interpret the result of the optimal ML model. Among the five ML models developed, the ExtraTrees model achieved the highest AUC values of 0.94 (95% confidence interval (CI): 0.89-0.99) and 0.94 (95% CI: 0.82-1.00) in TC and VC, respectively. This performance was followed by the extreme gradient boosting model (AUC = 0.94/0.93), the light gradient boosting machine (AUC = 0.92/0.91), the support vector machine (AUC = 0.91/0.94), and the logistic regression model (AUC = 0.86/0.87). The calibration curve and DCA graphically suggested good agreement and superior clinical benefits for the ExtraTrees model. SHAP analysis identified abdominal discomfort, lesion diameter, irregular shape, surface ulceration, and nonsmooth margin as the most influential features in the model's decision-making process. Our developed ML model exhibited superior capability and higher clinical benefit in distinguishing malignant from benign NPLs, particularly the ExtraTrees model. Furthermore, the SHAP analysis provided insightful interpretation of the ExtraTrees model for individualized and transparent prediction of NPLs.

  • Research Article
  • Cite Count Icon 2
  • 10.1186/s12911-024-02749-z
Explainable machine learning model for predicting the risk of significant liver fibrosis in patients with diabetic retinopathy
  • Nov 11, 2024
  • BMC Medical Informatics and Decision Making
  • Gangfeng Zhu + 12 more

BackgroundDiabetic retinopathy (DR), a prevalent complication in patients with type 2 diabetes, has attracted increasing attention. Recent studies have explored a plausible association between retinopathy and significant liver fibrosis. The aim of this investigation was to develop a sophisticated machine learning (ML) model, leveraging comprehensive clinical datasets, to forecast the likelihood of significant liver fibrosis in patients with retinopathy and to interpret the ML model by applying the SHapley Additive exPlanations (SHAP) method.MethodsThis inquiry was based on data from the National Health and Nutrition Examination Survey 2005–2008 cohort. Utilizing the Fibrosis-4 index (FIB-4), liver fibrosis was stratified across a spectrum of grades (F0-F4). The severity of retinopathy was determined using retinal imaging and segmented into four discrete gradations. A ten-fold cross-validation approach was used to gauge the propensity towards liver fibrosis. Eight ML methodologies were used: Extreme Gradient Boosting, Random Forest, multilayer perceptron, Support Vector Machines, Logistic Regression (LR), Plain Bayes, Decision Tree, and k-nearest neighbors. The efficacy of these models was gauged using metrics, such as the area under the curve (AUC). The SHAP method was deployed to unravel the intricacies of feature importance and explicate the inner workings of the ML model.ResultsThe analysis included 5,364 participants, of whom 2,116 (39.45%) exhibited notable liver fibrosis. Following random allocation, 3,754 individuals were assigned to the training set and 1,610 were allocated to the validation cohort. Nine variables were curated for integration into the ML model. Among the eight ML models scrutinized, the LR model attained zenith in both AUC (0.867, 95% CI: 0.855–0.878) and F1 score (0.749, 95% CI: 0.732–0.767). In internal validation, this model sustained its superiority, with an AUC of 0.850 and an F1 score of 0.736, surpassing all other ML models. The SHAP methodology unveils the foremost factors through importance ranking.ConclusionSophisticated ML models were crafted using clinical data to discern the propensity for significant liver fibrosis in patients with retinopathy and to intervene early.Practice implicationsImproved early detection of liver fibrosis risk in retinopathy patients enhances clinical intervention outcomes.

  • Research Article
  • Cite Count Icon 1
  • 10.2196/71229
A Machine Learning–Based Prognostication Model Enhances Prediction of Early Hepatic Encephalopathy in Patients With Noncancer-Related Cirrhosis: Multicenter Longitudinal Cohort Study in Taiwan
  • Aug 6, 2025
  • JMIR Medical Informatics
  • Hsin-Yu Chen + 4 more

BackgroundHepatic encephalopathy (HE) contributes significantly to mortality among patients with liver cirrhosis. Early prediction of HE is essential for clinical decision-making, yet remains challenging—particularly in noncancer-related cirrhosis due to the unpredictable disease course.ObjectiveThis study aimed to develop a novel machine learning (ML) model to improve early prediction of HE in patients with noncancer-related cirrhosis.MethodsA multicenter, retrospective cohort study was conducted from January 2010 to December 2017 across all Chang Gung Memorial Hospital branches in northern, middle, and southern Taiwan. We applied several ML models to evaluate HE predictability and compared their performance in the training dataset and testing dataset. Optimal sensitivity and specificity were determined using the Youden index. The best ML model was interpreted by the Shapley Additive Explanations plot.ResultsA total of 5878 patients with cirrhosis were included in the analysis, of whom 1187 (20.2%) subsequently developed HE. Compared to the non-HE group, patients with HE were older (median age 55, IQR 46‐65 vs median age 54, IQR 44‐66 years; P=.04) and had higher rates of hepatitis B virus infection (351/1187, 30% vs 961/4691, 20.5%; P<.001), alcohol use (540/1187, 45.5% vs 1512/4691, 32.2%; P<.001), sepsis (393/1187, 33.1% vs 792/4691, 16.9%; P<.001), and mortality (425/1187, 35.8% vs 502/4691, 10.7%; P<.001), along with distinct laboratory abnormalities reflecting liver dysfunction. Among the ML algorithms evaluated, the extreme gradient boosting algorithm demonstrated the highest predictive accuracy, achieving an area under the curve (AUC) of 0.86 (95% CI 0.83‐0.88) in the testing dataset. This performance was significantly superior to that of the neural network (AUC 0.79, 95% CI 0.76‐0.81; P<.001), support vector machine (AUC 0.77, 95% CI 0.73‐0.80; P<.001), and the model for end-stage liver disease score (AUC 0.74, 95% CI 0.71‐0.77; P<.001). Using a probability threshold of 0.25, the extreme gradient boosting model demonstrated a sensitivity of 72% (95% CI 0.67‐0.77), specificity of 80% (95% CI 0.78‐0.82), a positive predictive value of 48% (95% CI 43-53), and a negative predictive value of 92% (95% CI 90-94) in the testing set. Comparable performance was observed in the training dataset, with a sensitivity of 80% (95% CI 0.77‐0.83), specificity of 81% (95% CI 0.80‐0.82), and a negative predictive value of 94% at the same threshold. The most influential predictive variables identified by the model included serum ammonia, aspartate transaminase, alanine transaminase, prothrombin time, and serum potassium.ConclusionsWe developed a novel ML model for predicting HE in patients with noncancer-related cirrhosis. This model provides a practical guide to help physicians and these patients in shared decision-making regarding treatment strategy, with the ultimate goal of improving clinical care and reducing the burden of HE-related morbid complications.

  • Research Article
  • 10.1177/08850666251390848
An Interpretable Machine Learning Model for Early Multitemporal Prediction of Onset of Acute Kidney Injury in Intensive Care Unit Patients with Severe Trauma.
  • Oct 29, 2025
  • Journal of intensive care medicine
  • Bingrui Gao + 3 more

Acute Kidney Injury (AKI), a leading organ failure cause in critical patients, demands early high-risk identification to enhance outcomes. Yet comparative analyses of diagnostic and prognostic machine learning (ML) models across multiple post-admission timeframes are lacking. Using MIMIC-IV, we carried out using the Boruta algorithm for feature selection, developing and comparing six ML models to predict AKI risk at 0-24, 24-48, 48-72, 0-48, and 0-72 h post-ICU admission. Model performance was evaluated using the Area Under the Curve (AUC) and confusion matrix. Decision Curve and calibration analyses assessed clinical applicability. We compared models with Sequential Organ Failure Assessment (SOFA) and SAPSII scores to evaluate the accuracy of the ML models. Finally, Shapley Additive Explanations (SHAP) values interpreted and visualized key features of the optimal model. Our study involved 2092 trauma Intensive Care Unit (ICU) patients. Using the 17 selected out of the 48 features among trauma patients 24 h after ICU admissions, among the six ML models and two scoring systems, all ML models outperformed SOFA and SAPS II, and the extreme gradient boosting (XGBoost) exhibited the best performance, achieving an AUC of 0.948 (95% CI [0.929-0.966]) for AKI prediction within 24 h of admission, with an AUC of 0.941 ([0.892-0.917]) and 0.878 ([0.863-0.892]) at 0-48 and 0-72 h period, respectively. However, their predictive accuracies were very limited at 24-48 h (AUC 0.602 [0.562-0.643]) and 48-72 h (AUC 0.490 [0.429-0.551]), respectively. Urine output per kilogram per hour at 6 and 12 h and age were the most important features identified through SHAP analysis. Our study found ML models excel in diagnosing AKI risk in ICU trauma patients but have limited prognostic accuracy at 24-48 and 48-72 h post-admission. Further research is needed to improve this using time-series ML models with optimal windows.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.anl.2024.09.002
Development of machine learning models to predict papillary carcinoma in thyroid nodules: The role of immunological, radiologic, cytologic and radiomic features
  • Sep 20, 2024
  • Auris Nasus Larynx
  • Luca Canali + 9 more

Development of machine learning models to predict papillary carcinoma in thyroid nodules: The role of immunological, radiologic, cytologic and radiomic features

  • Research Article
  • 10.1371/journal.pone.0323949
Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population
  • Jun 17, 2025
  • PLOS One
  • Sazzli Kasim + 7 more

BackgroundCardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.ObjectiveThis study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.MethodsUtilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).ResultsThe LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.ConclusionsThese findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s12933-025-02911-5
An ensemble machine learning-based risk stratification tool for 30-day mortality prediction in critically ill cardiovascular patients.
  • Sep 30, 2025
  • Cardiovascular diabetology
  • Mingxing Lei + 11 more

Early mortality prediction in critically ill patients with cardiovascular disease remains challenging. This study aimed to develop and validate an ensemble machine learning (ML) model to predict 30-day mortality, comparing its performance with conventional severity scores and interrogating the incremental prognostic value of stress hyperglycemia ratio (SHR). A retrospective cohort of 1,595 ICU patients with cardiovascular disease combined with diabetes (2008-2022) was analyzed. SHR was calculated as admission glucose divided by estimated average glucose (eAG) from HbA1c. Six ML models (eXtreme Gradient Boosting [XGBoost], Decision Tree [DT], Random Forest [RF], Artificial Neural Network [ANN], Logistic Regression [LR], and Support Vector Machine [SVM]) were trained on 80% of the data, with the top three performers combined into an ensemble model. Model performance was evaluated using area under the curve (AUC), precision-recall, calibration, and clinical utility metrics. The 30-day mortality rate was 10.8% in the entire cohort (n = 173). The ensemble model demonstrated superior predictive performance with an AUC of 0.912 (95% CI: 0.888-0.936), outperforming both individual ML models (XGBoost, AUC = 0.903) and traditional scoring systems (APS III/SOFA/SAPS II AUCs ≤ 0.742; all P < 0.001). The top six important predictors included anti-hypertensives, aspirin, blood urea nitrogen (BUN), white blood cell (WBC), age, and red blood cell (RBC), with the Shapley Additive Explanations analysis revealing clinically meaningful patterns: a nonlinear risk escalation for age, linear risk increases with rising BUN and bilirubin levels, a protective effect associated with higher RBC counts, and both low and high WBC levels linked to increased early death risk. While SHR significantly improved the performance of traditional scoring systems (e.g., increasing SOFA AUC from 0.741 to 0.757, P = 0.010), its addition to the ensemble model provided limited incremental benefit (ΔAUC = - 0.032, P = 0.094). External validation in an independent cohort (n = 307) confirmed the model's robustness (AUC = 0.891, 95% CI: 0.864-0.917), with decision curve analysis demonstrating superior clinical utility across a wide range of risk thresholds. The ensemble ML model outperformed conventional prognostic tools in predicting 30-day mortality, with SHR augmenting traditional tools but not the ensemble ML model. This approach offers a reliable, interpretable framework for risk stratification in high-risk cardiovascular patients.

  • Research Article
  • Cite Count Icon 33
  • 10.1007/s00330-020-07083-2
Improved long-term prognostic value of coronary CT angiography-derived plaque measures and clinical parameters on adverse cardiac outcome using machine learning
  • Jul 28, 2020
  • European Radiology
  • Christian Tesche + 13 more

To evaluate the long-term prognostic value of coronary CT angiography (cCTA)-derived plaque measures and clinical parameters on major adverse cardiac events (MACE) using machine learning (ML). Datasets of 361 patients (61.9 ± 10.3years, 65% male) with suspected coronary artery disease (CAD) who underwent cCTA were retrospectively analyzed. MACE was recorded. cCTA-derived adverse plaque features and conventional CT risk scores together with cardiovascular risk factors were provided to a ML model to predict MACE. A boosted ensemble algorithm (RUSBoost) utilizing decision trees as weak learners with repeated nested cross-validation to train and validate the model was used. Performance of the ML model was calculated using the area under the curve (AUC). MACE was observed in 31 patients (8.6%) after a median follow-up of 5.4years. Discriminatory power was significantly higher for the ML model (AUC 0.96 [95%CI 0.93-0.98]) compared with conventional CT risk scores including Agatston calcium score (AUC 0.84 [95%CI 0.80-0.87]), segment involvement score (AUC 0.88 [95%CI 0.84-0.91]), and segment stenosis score (AUC 0.89 [95%CI 0.86-0.92], all p < 0.05). Similar results were shown for adverse plaque measures (AUCs 0.72-0.82, all p < 0.05) and clinical parameters including the Framingham risk score (AUCs 0.71-0.76, all p < 0.05). The ML model yielded significantly higher diagnostic performance compared with logistic regression analysis (AUC 0.96 vs. 0.92, p = 0.024). Integration of a ML model improves the long-term prediction of MACE when compared with conventional CT risk scores, adverse plaque measures, and clinical information. ML algorithms may improve the integration of patient's information to enhance risk stratification. • A machine learning (ML) model portends high discriminatory power to predict major adverse cardiac events (MACE). • ML-based risk stratification shows superior diagnostic performance for MACE prediction over coronary CT angiography (cCTA)-derived risk scores or clinical parameters alone. • A ML model outperforms conventional logistic regression analysis for the prediction of MACE.

  • Research Article
  • Cite Count Icon 2
  • 10.1097/md.0000000000038513
Performance evaluation of ML models for preoperative prediction of HER2-low BC based on CE-CBBCT radiomic features: A prospective study
  • Jun 14, 2024
  • Medicine
  • Xianfei Chen + 3 more

To explore the value of machine learning (ML) models based on contrast-enhanced cone-beam breast computed tomography (CE-CBBCT) radiomics features for the preoperative prediction of human epidermal growth factor receptor 2 (HER2)-low expression breast cancer (BC). Fifty-six patients with HER2-negative invasive BC who underwent preoperative CE-CBBCT were prospectively analyzed. Patients were randomly divided into training and validation cohorts at approximately 7:3. A total of 1046 quantitative radiomic features were extracted from CE-CBBCT images and normalized using z-scores. The Pearson correlation coefficient and recursive feature elimination were used to identify the optimal features. Six ML models were constructed based on the selected features: linear discriminant analysis (LDA), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost (AB), and decision tree (DT). To evaluate the performance of these models, receiver operating characteristic curves and area under the curve (AUC) were used. Seven features were selected as the optimal features for constructing the ML models. In the training cohort, the AUC values for SVM, LDA, RF, LR, AB, and DT were 0.984, 0.981, 1.000, 0.970, 1.000, and 1.000, respectively. In the validation cohort, the AUC values for the SVM, LDA, RF, LR, AB, and DT were 0.859, 0.880, 0.781, 0.880, 0.750, and 0.713, respectively. Among all ML models, the LDA and LR models demonstrated the best performance. The DeLong test showed that there were no significant differences among the receiver operating characteristic curves in all ML models in the training cohort (P > .05); however, in the validation cohort, the DeLong test showed that the differences between the AUCs of LDA and RF, AB, and DT were statistically significant (P = .037, .003, .046). The AUCs of LR and RF, AB, and DT were statistically significant (P = .023, .005, .030). Nevertheless, no statistically significant differences were observed when compared to the other ML models. ML models based on CE-CBBCT radiomics features achieved excellent performance in the preoperative prediction of HER2-low BC and could potentially serve as an effective tool to assist in precise and personalized targeted therapy.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/s00261-021-03051-6
Predicting the stages of liver fibrosis with multiphase CT radiomics based on volumetric features.
  • Mar 22, 2021
  • Abdominal Radiology
  • Enming Cui + 6 more

To develop and externally validate a multiphase computed tomography (CT)-based machine learning (ML) model for staging liver fibrosis (LF) by using whole liver slices. The development dataset comprised 232 patients with pathological analysis for LF, and the test dataset comprised 100 patients from an independent outside institution. Feature extraction was performed based on the precontrast (PCP), arterial (AP), portal vein (PVP) phase, and three-phase CT images. CatBoost was utilized for ML model investigation by using the features with good reproducibility. The diagnostic performance of ML models based on each single- and three-phase CT image was compared with that of radiologists' interpretations, the aminotransferase-to-platelet ratio index, and the fibrosis index based on four factors (FIB-4) by using the receiver operating characteristic curve with the area under the curve (AUC) value. Although the ML model based on three-phase CT image (AUC = 0.65-0.80) achieved higher AUC value than that based on PCP (AUC = 0.56-0.69) and PVP (AUC = 0.51-0.74) in predicting various stage of LF, significant difference was not found. The best CT-based ML model (AUC = 0.65-0.80) outperformed the FIB-4 in differentiating advanced LF and cirrhosis and radiologists' interpretation (AUC = 0.50-0.76) in the diagnosis of significant and advanced LF. All PCP, PVP, and three-phase CT-based ML models can be an acceptable in assessing LF, and the performance of the PCP-based ML model is comparable to that of the enhanced CT image-based ML model.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.eswa.2023.120649
Comparative study on the performance of different machine learning techniques to predict the shear strength of RC deep beams: Model selection and industry implications
  • Jun 3, 2023
  • Expert Systems with Applications
  • Khuong Le Nguyen + 3 more

Comparative study on the performance of different machine learning techniques to predict the shear strength of RC deep beams: Model selection and industry implications

  • Research Article
  • 10.1186/s12874-025-02694-z
Comparison of machine learning methods versus traditional Cox regression for survival prediction in cancer using real-world data: a systematic literature review and meta-analysis
  • Oct 28, 2025
  • BMC Medical Research Methodology
  • Yinan Huang + 6 more

BackgroundAccurate prediction of survival in oncology can guide targeted interventions. The traditional regression-based Cox proportional hazards (CPH) model has statistical assumptions and may have limited predictive accuracy. With the capability to model large datasets, machine learning (ML) holds the potential to improve the prediction of time-to-event outcomes, such as cancer survival outcomes. The present study aimed to systematically summarize the use of ML models for cancer survival outcomes in observational studies and to compare the performance of ML models with CPH models.MethodsWe systematically searched PubMed, MEDLINE (via EBSCO), and Embase for studies that evaluated ML models vs. CPH models for cancer survival outcomes. The use of ML algorithms was summarized, and either the area under the curve (AUC) or the concordance index (C-index) for the ML and CPH models were presented descriptively. Only studies that provided a measure of discrimination, i.e., AUC or C-index, and 95% confidence interval (CI) were included in the final meta-analysis. A random-effects model was used to compare the predictive performance in the pooled AUC or C-index estimates between ML and CPH models using R. The quality of the studies was evaluated using available checklists. Multiple sensitivity analyses were performed.ResultsA total of 21 studies were included for systematic review and 7 for meta-analysis. Across the 21 articles, diverse ML models were used, including random survival forest (N=16, 76.19%), gradient boosting (N=5, 23.81%), and deep learning (N=8, 38.09%). In predicting cancer survival outcomes, ML models showed no superior performance over CPH regression. The standardized mean difference in AUC or C-index was 0.01 (95% CI: -0.01 to 0.03). Results from the sensitivity analyses confirmed the robustness of the main findings.ConclusionsML models had similar performance compared with CPH models in predicting cancer survival outcomes. Although this systematic review highlights the promising use of ML to improve the quality of care in oncology, findings from this review also suggest opportunities to improve ML reporting transparency. Future systematic reviews should focus on the comparative performance between specific ML models and CPH regression in time-to-event outcomes in specific type of cancer or other disease areas.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12874-025-02694-z.

  • Research Article
  • 10.1136/bmjopen-2025-108527
Development of explainable machine learning models to predict side effects in patients with rheumatoid arthritis taking methotrexate treatment: a nationwide multicentre cohort study
  • Nov 1, 2025
  • BMJ Open
  • Junbeom Jang + 3 more

ObjectivesMethotrexate (MTX) effectively controls rheumatoid arthritis (RA) but often leads to side effects (SE) such as gastrointestinal (GI) issues, liver toxicity and bone marrow suppression. To develop clinically interpretable machine learning (ML) models that accurately predict MTX-related SE in patients with RA taking MTX. The aim was to enhance predictive accuracy and to identify patient-specific risk factors using explainable artificial intelligence (XAI), thereby enabling transparent clinical interpretation. We specifically sought to address the unmet need for individualised risk stratification using real-world, multicentre observational data.DesignRetrospective case-control study.SettingAcross 23 rheumatology clinics in South Korea, based on data from a nationwide multicentre cohort.ParticipantsA total of 5077 patients with RA were initially enrolled from the Korean Observational Study Network for Arthritis. After excluding those with missing clinical, demographic or prescription data and those not receiving MTX, 2375 patients remained eligible. Among these, 1654 and 1218 patients were included in the overall SE and GI SE analysis groups, respectively, after 1:1 propensity score matching. All patients were aged ≥18 years and met the 1987 American College of Rheumatology classification criteria.Primary and secondary outcome measuresThe primary outcome was the presence of SE in patients with RA taking MTX, categorised into overall SE and GI SE, based on standardised patient questionnaires and clinical assessments. The secondary outcome was the identification of key predictors using SHapley Additive exPlanations (SHAP) to enhance the interpretability of ML predictions.ResultsAmong six ML classifiers, extreme gradient boosting demonstrated the highest performance in predicting overall SE (area under the curve (AUC) 0.781, F1 score 0.672, area under the precision-recall curve (AUPRC) 0.757) and GI SE (AUC 0.701, F1 score 0.690, AUPRC 0.670). SHAP analysis identified key predictive features including age, physician visual analogue scale score, alanine aminotransferase, Health Assessment Questionnaire score, celecoxib use and drug adherence. Logistic regression confirmed statistical significance for multiple variables (eg, OR 4.63; 95% CI 1.41 to 20.90 for non-adherence >30 days; OR 1.45; 95% CI 1.14 to 1.85 for celecoxib use). DeLong’s test indicated that boosting models significantly outperformed support vector machine (p<0.001).ConclusionsInterpretable ML models using real-world clinical data can accurately predict SE in patients with RA taking MTX. These models may facilitate early identification of high-risk individuals and inform personalised treatment strategies. Integration into clinical decision support systems could improve MTX safety monitoring. Further prospective validation in external cohorts is warranted.

  • Research Article
  • Cite Count Icon 3
  • 10.1186/s12879-025-10958-8
A machine learning model for predicting severe mycoplasma pneumoniae pneumonia in school-aged children
  • Apr 21, 2025
  • BMC Infectious Diseases
  • Yingying Ye + 5 more

ObjectiveTo develop an interpretable machine learning (ML) model for predicting severe Mycoplasma pneumoniae pneumonia (SMPP) in order to provide reliable factors for predicting the clinical type of the disease.MethodsWe collected clinical data from 483 school-aged children with M. pneumoniae pneumonia (MPP) who were hospitalized at the Children's Hospital of Soochow University between September 2021 and June 2024. Difference analysis and univariate logistic regression were employed to identify predictors for training features in ML. Eight ML algorithms were used to build models based on the selected features, and their effectiveness was validated. The area under the curve (AUC), accuracy, five-fold cross-validation, and decision curve analysis (DCA) were utilized to evaluate model performance. Finally, the best-performing ML model was selected, and the Shapley Additive Explanations (SHAP) method was applied to rank the importance of clinical features and interpret the final model.ResultsAfter feature selection, 30 variables remained. We constructed eight ML models and assessed their effectiveness, finding that the CatBoost model exhibited the best predictive performance, with an AUC of 0.934 and an accuracy of 0.9175. DCA was used to compare the clinical benefits of the models, revealing that the CatBoost model provided greater net benefits than the other ML models within the threshold probability range of 34% to 75%. Additionally, we applied the SHAP method to interpret the CatBoost model, and the SHAP diagram was used to visually show the influence of predictor variables on the outcome. The results identified the top six risk factors as the number of days with fever, D-dimer, platelet count (PLT), C-reactive protein (CRP), lactate dehydrogenase (LDH), and the neutrophil-to-lymphocyte ratio (NLR).ConclusionsThe interpretable CatBoost model can help physicians accurately identify school-aged children with SMPP. This early identification facilitates better treatment options and timely prevention of complications. Furthermore, the SHAP algorithm enhances the model's transparency and increases its trustworthiness in practical applications.

  • Research Article
  • 10.2147/jir.s476716
Interpretable Machine Learning Model Based on Superb Microvascular Imaging for Non-Invasive Determination of Crescent Status of IgAN.
  • Sep 1, 2024
  • Journal of inflammation research
  • Yan Tang + 3 more

To assess the crescentic status of IgA nephropathy (IgAN) non-invasively using a superb microvascular imaging (SMI)-based radiomics machine learning (ML) model. IgAN patients who underwent renal biopsy from June 2022 to October 2023, with two-dimensional ultrasound (US) and SMI examinations conducted one day prior to the renal biopsy. The patients selected were divided randomly into a training group and a test group in a 7:3 ratio. Radiomic features were extracted from US and SMI images, then radiomic features were constructed and ML models were further established using logistic regression (LR) and extreme gradient boosting (XGBoost)XGBoost to determine the crescentic status. The utility of the proposed model was evaluated using receiver operating characteristics, calibration, and decision curve analysis. The SHapley Additive exPlanations (SHAP) was utilized to explain the best-performing ML model. A total of 147 IgAN patients were included in the study, with 103 in the training group and 44 in the test group .Among them, the US-SMI based XGBoost model achieved the best results, with an the area under the curve (AUC) of 0.839 (95% CI,0.756-0.910) and an accuracy of 78.6% in the training group.In the test group, the AUC was 0.859 (95% CI,0.721-0.964), and the accuracy was 81.8%, significantly surpassing the ML model of a single modality and the clinical model established based on occult blood. Additionally, the decision curve analysis (DCA) demonstrated that the XGBoost model provided a higher overall net benefit in the both groups. The SMI radiomics ML model has the capability to accurately predict the crescentic status of IgAN patients, providing effective assistance for clinical treatment decisions.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.