Performance of machine learning algorithms in diffusion tensor imaging of movement disorders: an exploratory meta-analysis.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Machine learning (ML) applied to diffusion tensor imaging (DTI) has emerged as a promising tool for detecting microstructural brain alterations in movement disorders. However, existing studies vary widely in design, sample size, imaging pipelines, and analytic rigor, resulting in high methodological heterogeneity that limits quantitative comparability. This exploratory meta-analysis and narrative synthesis aimed to characterize performance trends, methodological diversity, and sources of variability among ML models trained on DTI data for classifying movement disorders, rather than to infer a single pooled diagnostic effect. This was designated exploratory because extreme heterogeneity prevented confirmatory pooled effect inference, so the analysis focused on describing performance distributions and methodological patterns rather than estimating a unified diagnostic effect. A systematic search of PubMed, Web of Science, and Scopus identified human studies applying ML algorithms to DTI for diagnostic or classification purposes. Accuracy, sensitivity, specificity, and the area under the curve (AUC) were extracted, with multiple imputation used for incomplete metrics with missingness rates below 40%. Random-effects modeling was employed to provide descriptive summaries, and subgroup analyses were conducted to explore trends across disorders, model architectures, and imaging modalities. Study qualities were assessed with JBI tools. Forty-six studies (2016-2024) were included, spanning Parkinson's disease, Tourette syndrome, and essential tremor. Reported performance was generally high (median AUC ≈ 0.91), but between-study heterogeneity was extreme (I2 = 94.7%), indicating that studies were estimating distinct effects. Disorder-specific subgroup AUCs varied markedly: Essential Tremor (0.95), Parkinson's (0.90), Tourette's (0.88), and Other (0.79). Deep learning and radiomics-based models have reported higher accuracies, but they were often trained on small, single-center cohorts (37-139 participants), which limits their external validity. Pooled statistics were presented descriptively to illustrate performance ranges despite high heterogeneity, and were not interpreted as confirmatory effect sizes. ML models using DTI demonstrate high internal performance across studies, although generalizability remains limited across multiple movement disorders; however, current evidence remains exploratory due to small sample sizes, methodological fragmentation, and a lack of standardized imaging pipelines. Rather than confirmatory inference, these findings provide a descriptive map of emerging trends in ML-DTI diagnostics. Future progress will depend on data harmonization initiatives, multicenter collaborations, and federated learning frameworks that can support reproducible, generalizable, and clinically interpretable models.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1245/s10434-024-15197-w
Dual-Region Computed Tomography Radiomics-Based Machine Learning Predicts Subcarinal Lymph Node Metastasis in Patients with Non-small Cell Lung Cancer.
  • Mar 23, 2024
  • Annals of surgical oncology
  • Hao-Ji Yan + 11 more

Noninvasively and accurately predicting subcarinal lymph node metastasis (SLNM) for patients with non-small cell lung cancer (NSCLC) remains challenging. This study was designed to develop and validate a tumor and subcarinal lymph nodes (tumor-SLNs) dual-region computed tomography (CT) radiomics model for predicting SLNM in NSCLC. This retrospective study included NSCLC patients who underwent lung resection and SLNs dissection between January 2017 and December 2020. The radiomic features of the tumor and SLNs were extracted from preoperative CT, respectively. Ninety machine learning (ML) models were developed based on tumor region, SLNs region, and tumor-SLNs dual-region. The model performance was assessed by the area under the curve (AUC) and validated internally by fivefold cross-validation. In total, 202 patients were included in this study. ML models based on dual-region radiomics showed good performance for SLNM prediction, with a median AUC of 0.794 (range, 0.686-0.880), which was superior to those of models based on tumor region (median AUC, 0.746; range, 0.630-0.811) and SLNs region (median AUC, 0.700; range, 0.610-0.842). The ML model, which is developed by using the naive Bayes algorithm and dual-region features, had the highest AUC of 0.880 (range of cross-validation, 0.825-0.937) among all ML models. The optimal logistic regression model was inferior to the optimal ML model for predicting SLNM, with an AUC of 0.727. The CT radiomics showed the potential for accurately predicting SLNM in NSCLC patients. The ML model with dual-region radiomic features has better performance than the logistic regression or single-region models.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1017/s1049023x24000414
Applications and Performance of Machine Learning Algorithms in Emergency Medical Services: A Scoping Review.
  • May 17, 2024
  • Prehospital and disaster medicine
  • Ahmad Alrawashdeh + 8 more

The aim of this study was to summarize the literature on the applications of machine learning (ML) and their performance in Emergency Medical Services (EMS). Four relevant electronic databases were searched (from inception through January 2024) for all original studies that employed EMS-guided ML algorithms to enhance the clinical and operational performance of EMS. Two reviewers screened the retrieved studies and extracted relevant data from the included studies. The characteristics of included studies, employed ML algorithms, and their performance were quantitively described across primary domains and subdomains. This review included a total of 164 studies published from 2005 through 2024. Of those, 125 were clinical domain focused and 39 were operational. The characteristics of ML algorithms such as sample size, number and type of input features, and performance varied between and within domains and subdomains of applications. Clinical applications of ML algorithms involved triage or diagnosis classification (n = 62), treatment prediction (n = 12), or clinical outcome prediction (n = 50), mainly for out-of-hospital cardiac arrest/OHCA (n = 62), cardiovascular diseases/CVDs (n = 19), and trauma (n = 24). The performance of these ML algorithms varied, with a median area under the receiver operating characteristic curve (AUC) of 85.6%, accuracy of 88.1%, sensitivity of 86.05%, and specificity of 86.5%. Within the operational studies, the operational task of most ML algorithms was ambulance allocation (n = 21), followed by ambulance detection (n = 5), ambulance deployment (n = 5), route optimization (n = 5), and quality assurance (n = 3). The performance of all operational ML algorithms varied and had a median AUC of 96.1%, accuracy of 90.0%, sensitivity of 94.4%, and specificity of 87.7%. Generally, neural network and ensemble algorithms, to some degree, out-performed other ML algorithms. Triaging and managing different prehospital medical conditions and augmenting ambulance performance can be improved by ML algorithms. Future reports should focus on a specific clinical condition or operational task to improve the precision of the performance metrics of ML models.

  • Research Article
  • Cite Count Icon 3
  • 10.1080/00325481.2022.2115735
Machine learning model for predicting 1-year and 3-year all-cause mortality in ischemic heart failure patients
  • Aug 21, 2022
  • Postgraduate Medicine
  • Anping Cai + 6 more

Objective Machine learning (ML) model has not been developed specifically for ischemic heart failure (HF) patients. Whether the performance of ML model is better than the MAGGIC risk score and NT-proBNP is unknown. The current study was to apply ML algorithm to build risk model for predicting 1-year and 3-year all-cause mortality in ischemic HF patient and to compare the performance of ML model with the MAGGIC risk score and NT-proBNP. Method Three ML algorithms without and with feature selection were used for model exploration, and the performance was determined based on the area under the curve (AUC) in five-fold cross-validation. The best performing ML model was selected and compared with the MAGGIC risk score and NT-proBNP. The calibration of ML model was assessed by the Brier score. Results Random forest with feature selection had the highest AUC (0.742 and 95% CI: 0.697–0.787) for predicting 1-year all-cause mortality, and support vector machine without feature selection had the highest AUC (0.732 and 95% CI: 0.694–0.707) for predicting 3-year all-cause mortality. When compared to the MAGGIC risk score and NT-proBNP, ML model had a comparable AUC for predicting 1-year (0.742 vs 0.714 vs 0.694) and 3-year all-cause mortality (0.732 vs 0.712 vs 0.682). Brier scores for predicting 1-year and 3-year all-cause mortality were 0.068 and 0.174, respectively. Conclusion ML models predicted prognosis in ischemic HF with good discrimination and well calibration. These models may be used by clinicians as a decision-making tool to estimate the prognosis of ischemic HF patients.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.3389/fbioe.2022.903426
Performance of Machine Learning Algorithms for Predicting Adverse Outcomes in Community-Acquired Pneumonia
  • Jun 29, 2022
  • Frontiers in Bioengineering and Biotechnology
  • Zhixiao Xu + 4 more

Background: The ability to assess adverse outcomes in patients with community-acquired pneumonia (CAP) could improve clinical decision-making to enhance clinical practice, but the studies remain insufficient, and similarly, few machine learning (ML) models have been developed.Objective: We aimed to explore the effectiveness of predicting adverse outcomes in CAP through ML models.Methods: A total of 2,302 adults with CAP who were prospectively recruited between January 2012 and March 2015 across three cities in South America were extracted from DryadData. After a 70:30 training set: test set split of the data, nine ML algorithms were executed and their diagnostic accuracy was measured mainly by the area under the curve (AUC). The nine ML algorithms included decision trees, random forests, extreme gradient boosting (XGBoost), support vector machines, Naïve Bayes, K-nearest neighbors, ridge regression, logistic regression without regularization, and neural networks. The adverse outcomes included hospital admission, mortality, ICU admission, and one-year post-enrollment status.Results: The XGBoost algorithm had the best performance in predicting hospital admission. Its AUC reached 0.921, and accuracy, precision, recall, and F1-score were better than those of other models. In the prediction of ICU admission, a model trained with the XGBoost algorithm showed the best performance with AUC 0.801. XGBoost algorithm also did a good job at predicting one-year post-enrollment status. The results of AUC, accuracy, precision, recall, and F1-score indicated the algorithm had high accuracy and precision. In addition, the best performance was seen by the neural network algorithm when predicting death (AUC 0.831).Conclusions: ML algorithms, particularly the XGBoost algorithm, were feasible and effective in predicting adverse outcomes of CAP patients. The ML models based on available common clinical features had great potential to guide individual treatment and subsequent clinical decisions.

  • Research Article
  • Cite Count Icon 34
  • 10.1007/s00330-020-07083-2
Improved long-term prognostic value of coronary CT angiography-derived plaque measures and clinical parameters on adverse cardiac outcome using machine learning
  • Jul 28, 2020
  • European Radiology
  • Christian Tesche + 13 more

To evaluate the long-term prognostic value of coronary CT angiography (cCTA)-derived plaque measures and clinical parameters on major adverse cardiac events (MACE) using machine learning (ML). Datasets of 361 patients (61.9 ± 10.3years, 65% male) with suspected coronary artery disease (CAD) who underwent cCTA were retrospectively analyzed. MACE was recorded. cCTA-derived adverse plaque features and conventional CT risk scores together with cardiovascular risk factors were provided to a ML model to predict MACE. A boosted ensemble algorithm (RUSBoost) utilizing decision trees as weak learners with repeated nested cross-validation to train and validate the model was used. Performance of the ML model was calculated using the area under the curve (AUC). MACE was observed in 31 patients (8.6%) after a median follow-up of 5.4years. Discriminatory power was significantly higher for the ML model (AUC 0.96 [95%CI 0.93-0.98]) compared with conventional CT risk scores including Agatston calcium score (AUC 0.84 [95%CI 0.80-0.87]), segment involvement score (AUC 0.88 [95%CI 0.84-0.91]), and segment stenosis score (AUC 0.89 [95%CI 0.86-0.92], all p < 0.05). Similar results were shown for adverse plaque measures (AUCs 0.72-0.82, all p < 0.05) and clinical parameters including the Framingham risk score (AUCs 0.71-0.76, all p < 0.05). The ML model yielded significantly higher diagnostic performance compared with logistic regression analysis (AUC 0.96 vs. 0.92, p = 0.024). Integration of a ML model improves the long-term prediction of MACE when compared with conventional CT risk scores, adverse plaque measures, and clinical information. ML algorithms may improve the integration of patient's information to enhance risk stratification. • A machine learning (ML) model portends high discriminatory power to predict major adverse cardiac events (MACE). • ML-based risk stratification shows superior diagnostic performance for MACE prediction over coronary CT angiography (cCTA)-derived risk scores or clinical parameters alone. • A ML model outperforms conventional logistic regression analysis for the prediction of MACE.

  • Research Article
  • Cite Count Icon 14
  • 10.1007/s00261-021-03051-6
Predicting the stages of liver fibrosis with multiphase CT radiomics based on volumetric features.
  • Mar 22, 2021
  • Abdominal Radiology
  • Enming Cui + 6 more

To develop and externally validate a multiphase computed tomography (CT)-based machine learning (ML) model for staging liver fibrosis (LF) by using whole liver slices. The development dataset comprised 232 patients with pathological analysis for LF, and the test dataset comprised 100 patients from an independent outside institution. Feature extraction was performed based on the precontrast (PCP), arterial (AP), portal vein (PVP) phase, and three-phase CT images. CatBoost was utilized for ML model investigation by using the features with good reproducibility. The diagnostic performance of ML models based on each single- and three-phase CT image was compared with that of radiologists' interpretations, the aminotransferase-to-platelet ratio index, and the fibrosis index based on four factors (FIB-4) by using the receiver operating characteristic curve with the area under the curve (AUC) value. Although the ML model based on three-phase CT image (AUC = 0.65-0.80) achieved higher AUC value than that based on PCP (AUC = 0.56-0.69) and PVP (AUC = 0.51-0.74) in predicting various stage of LF, significant difference was not found. The best CT-based ML model (AUC = 0.65-0.80) outperformed the FIB-4 in differentiating advanced LF and cirrhosis and radiologists' interpretation (AUC = 0.50-0.76) in the diagnosis of significant and advanced LF. All PCP, PVP, and three-phase CT-based ML models can be an acceptable in assessing LF, and the performance of the PCP-based ML model is comparable to that of the enhanced CT image-based ML model.

  • Research Article
  • 10.1182/blood-2024-211964
Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis
  • Nov 5, 2024
  • Blood
  • Karna Desai + 5 more

Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s12874-025-02694-z
Comparison of machine learning methods versus traditional Cox regression for survival prediction in cancer using real-world data: a systematic literature review and meta-analysis
  • Oct 28, 2025
  • BMC Medical Research Methodology
  • Yinan Huang + 6 more

BackgroundAccurate prediction of survival in oncology can guide targeted interventions. The traditional regression-based Cox proportional hazards (CPH) model has statistical assumptions and may have limited predictive accuracy. With the capability to model large datasets, machine learning (ML) holds the potential to improve the prediction of time-to-event outcomes, such as cancer survival outcomes. The present study aimed to systematically summarize the use of ML models for cancer survival outcomes in observational studies and to compare the performance of ML models with CPH models.MethodsWe systematically searched PubMed, MEDLINE (via EBSCO), and Embase for studies that evaluated ML models vs. CPH models for cancer survival outcomes. The use of ML algorithms was summarized, and either the area under the curve (AUC) or the concordance index (C-index) for the ML and CPH models were presented descriptively. Only studies that provided a measure of discrimination, i.e., AUC or C-index, and 95% confidence interval (CI) were included in the final meta-analysis. A random-effects model was used to compare the predictive performance in the pooled AUC or C-index estimates between ML and CPH models using R. The quality of the studies was evaluated using available checklists. Multiple sensitivity analyses were performed.ResultsA total of 21 studies were included for systematic review and 7 for meta-analysis. Across the 21 articles, diverse ML models were used, including random survival forest (N=16, 76.19%), gradient boosting (N=5, 23.81%), and deep learning (N=8, 38.09%). In predicting cancer survival outcomes, ML models showed no superior performance over CPH regression. The standardized mean difference in AUC or C-index was 0.01 (95% CI: -0.01 to 0.03). Results from the sensitivity analyses confirmed the robustness of the main findings.ConclusionsML models had similar performance compared with CPH models in predicting cancer survival outcomes. Although this systematic review highlights the promising use of ML to improve the quality of care in oncology, findings from this review also suggest opportunities to improve ML reporting transparency. Future systematic reviews should focus on the comparative performance between specific ML models and CPH regression in time-to-event outcomes in specific type of cancer or other disease areas.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12874-025-02694-z.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.arth.2024.10.060
Racial and Ethnic Disparities in Predictive Accuracy of Machine Learning Algorithms Developed Using a National Database for 30-Day Complications Following Total Joint Arthroplasty
  • Oct 20, 2024
  • The Journal of Arthroplasty
  • Christian A Pean + 6 more

Racial and Ethnic Disparities in Predictive Accuracy of Machine Learning Algorithms Developed Using a National Database for 30-Day Complications Following Total Joint Arthroplasty

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.healun.2022.03.010
Noninvasive monitoring of allograft rejection in a rat lung transplant model: Application of machine learning-based 18F-fluorodeoxyglucose positron emission tomography radiomics.
  • Jun 1, 2022
  • The Journal of Heart and Lung Transplantation
  • Dong Tian + 8 more

Noninvasive monitoring of allograft rejection in a rat lung transplant model: Application of machine learning-based 18F-fluorodeoxyglucose positron emission tomography radiomics.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3389/fped.2024.1330420
Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data.
  • Jan 31, 2024
  • Frontiers in Pediatrics
  • Deming Kong + 5 more

To develop and compare different AutoML frameworks and machine learning models to predict premature birth. The study used a large electronic medical record database to include 715,962 participants who had the principal diagnosis code of childbirth. Three Automatic Machine Learning (AutoML) were used to construct machine learning models including tree-based models, ensembled models, and deep neural networks on the training sample (N = 536,971). The area under the curve (AUC) and training times were used to assess the performance of the prediction models, and feature importance was computed via permutation-shuffling. The H2O AutoML framework had the highest median AUC of 0.846, followed by AutoGluon (median AUC: 0.840) and Auto-sklearn (median AUC: 0.820), and the median training time was the lowest for H2O AutoML (0.14 min), followed by AutoGluon (0.16 min) and Auto-sklearn (4.33 min). Among different types of machine learning models, the Gradient Boosting Machines (GBM) or Extreme Gradient Boosting (XGBoost), stacked ensemble, and random forrest models had better predictive performance, with median AUC scores being 0.846, 0.846, and 0.842, respectively. Important features related to preterm birth included premature rupture of membrane (PROM), incompetent cervix, occupation, and preeclampsia. Our study highlights the potential of machine learning models in predicting the risk of preterm birth using readily available electronic medical record data, which have significant implications for improving prenatal care and outcomes.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.bbmt.2019.12.707
Successful Personalization of Propylene Glycol Free Melphalan (PGF-MEL) for Multiple Myeloma (MM) and AL Amyloidosis (AL) Patients Undergoing Autologous Hematopoietic Stem Cell Transplant (AHCT) Using Pharmacokinetic (PK)-Directed Dosing
  • Jan 23, 2020
  • Biology of Blood and Marrow Transplantation
  • Gunjan L Shah + 14 more

Successful Personalization of Propylene Glycol Free Melphalan (PGF-MEL) for Multiple Myeloma (MM) and AL Amyloidosis (AL) Patients Undergoing Autologous Hematopoietic Stem Cell Transplant (AHCT) Using Pharmacokinetic (PK)-Directed Dosing

  • Supplementary Content
  • Cite Count Icon 1
  • 10.1002/jmri.70122
Application of Machine Learning in the Diagnosis and Prognosis of Mild Traumatic Brain Injury Using Diffusion Tensor Imaging: A Systematic Review
  • Sep 30, 2025
  • Journal of Magnetic Resonance Imaging
  • Christian John A Saludar + 8 more

ABSTRACTBackgroundTraumatic Brain Injury (TBI) is a global health concern, with mild TBI (mTBI) being the most common form. Despite its prevalence, accurately diagnosing mTBI remains a significant challenge. While advanced neuroimaging techniques like diffusion tensor imaging (DTI) offer promise for more robust diagnosis, their clinical application is limited by inconsistent and heterogeneous post‐injury findings. Recently, machine learning (ML) techniques, utilizing DTI metrics as features, have shown increasing utility in mTBI research. This approach helps identify distinct between‐group features, paving the way for more precise and efficient diagnostic and prognostic tools.PurposeThis review aims to analyze studies employing ML techniques to assess changes in DTI metrics after mTBI.Study TypeSystematic review.Population or Subjects or Phantom or Specimen or Animal ModelWe conducted a systematic review, adhering to PRISMA guidelines, on the application of ML with DTI for mTBI diagnosis and prognosis on human subjects. This review identified 36 articles.Field Strength/SequenceN/A.AssessmentStudy quality was assessed using the Modified QualSyst Assessment Tool.Statistical TestsN/A.ResultsThe review found ML techniques using DTI Metrics either alone or in combination with other modalities (i.e., structural MRI, functional MRI, clinical scores, or demographics) can effectively classify mTBI patients from controls. These approaches have also demonstrated potential in classifying mTBI patients according to the degree of recovery and symptom severity. In addition, these ML models showed strong predictive power toward cognitive scores and brain structural decline, as quantified by brain‐predicted age difference.Data ConclusionLarger, externally validated studies are needed to develop robust models for the diagnosis and prognosis of mTBI, using imaging biomarkers (including DTI) in conjunction with non‐imaging, on‐field, or clinical data. Despite the high predictive performance of ML algorithms, the clinical application remains distant, likely due to the small sample size of studies and lack of external validation, which raises concerns about overfitting.Evidence Level5.Technical EfficacyStage 1.

  • Research Article
  • 10.29271/jcpsp.2025.08.1007
Predicting Extracorporeal Shock Wave Lithotripsy Outcomes Using Machine Learning and the Triple-/Quadruple-D Scores.
  • Aug 1, 2025
  • Journal of the College of Physicians and Surgeons--Pakistan : JCPSP
  • Mucahit Gelmis + 5 more

To evaluate the predictive performance of the triple-D and quadruple-D scores integrated with machine learning (ML) models in determining stone-free outcomes after extracorporeal shock wave lithotripsy (ESWL), and to compare ML model performance and identify its key predictors influencing ESWL success. An observational study. Place and Duration of the Study: Department of Urology, Gaziosmanpasa Training and Research Hospital, Istanbul, Turkiye, from October 2020 to November 2024. A total of 309 patients who underwent ESWL were analysed. The patients were categorised into stone-free and non-stone- free groups based on post-treatment imaging. Clinical parameters, including quadruple-D score (stone volume, density, skin-to-stone distance [SSD], and location), were recorded. Three ML models‒random forest (RF), logistic regression (LR), and neural network (NN)‒were trained on 80% of the dataset and tested on 20%. Model performance was assessed using accuracy, area under the curve (AUC), precision, recall, and F1 score. The quadruple-D score (AUC: 0.724) demonstrated superior predictive power compared to the Triple-D score (AUC: 0.700). Among ML models, RF achieved the highest accuracy (82.9%, AUC: 0.91), followed by NN (80.9%, AUC: 0.87) and LR (79.6%, AUC: 0.85). Significant predictors of ESWL success were stone density, volume, SSD, and the quadruple-D score, while age and body mass index (BMI) were not significant. Integrating the quadruple-D score with ML models, particularly RF, enhances the prediction of ESWL outcomes. Combining clinical expertise with computational intelligence can refine patient selection and optimise treatment strategies. However, prospective studies are needed to validate these findings. Extracorporeal shock wave lithotripsy, Quadruple-D score, Machine learning, Random forest, Stone-free prediction.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.1007/s12012-024-09843-8
Development and Validation of Machine Learning Algorithms to Predict 1-Year Ischemic Stroke and Bleeding Events in Patients with Atrial Fibrillation and Cancer
  • Mar 18, 2024
  • Cardiovascular Toxicology
  • Bang Truong + 5 more

In this study, we leveraged machine learning (ML) approach to develop and validate new assessment tools for predicting stroke and bleeding among patients with atrial fibrillation (AFib) and cancer. We conducted a retrospective cohort study including patients who were newly diagnosed with AFib with a record of cancer from the 2012–2018 Surveillance, Epidemiology, and End Results (SEER)-Medicare database. The ML algorithms were developed and validated separately for each outcome by fitting elastic net, random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), and neural network models with tenfold cross-validation (train:test = 7:3). We obtained area under the curve (AUC), sensitivity, specificity, and F2 score as performance metrics. Model calibration was assessed using Brier score. In sensitivity analysis, we resampled data using Synthetic Minority Oversampling Technique (SMOTE). Among 18,388 patients with AFib and cancer, 523 (2.84%) had ischemic stroke and 221 (1.20%) had major bleeding within one year after AFib diagnosis. In prediction of ischemic stroke, RF significantly outperformed other ML models [AUC (0.916, 95% CI 0.887–0.945), sensitivity 0.868, specificity 0.801, F2 score 0.375, Brier score = 0.035]. However, the performance of ML algorithms in prediction of major bleeding was low with highest AUC achieved by RF (0.623, 95% CI 0.554–0.692). RF models performed better than CHA2DS2-VASc and HAS-BLED scores. SMOTE did not improve the performance of the ML algorithms. Our study demonstrated a promising application of ML in stroke prediction among patients with AFib and cancer. This tool may be leveraged in assisting clinicians to identify patients at high risk of stroke and optimize treatment decisions.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.