Articles published on Risk Of Bias Assessment Tool
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1699 Search results
Sort by Recency
- New
- Research Article
- 10.1111/jocn.70365
- May 20, 2026
- Journal of clinical nursing
- Xiaosong Yu + 4 more
To systematically review the evidence on diagnostic prediction models for depression in patients with breast cancer. Systematic review. Ten databases were searched from inception to 22 August 2025, with an updated search on 17 December 2025, to identify original studies developing and/or validating diagnostic prediction models for depression in patients with breast cancer. Data were extracted using the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) framework. Two reviewers independently assessed risk of bias and applicability of included studies using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Eleven studies were included. Reported area under the curve (AUC) values ranged from 0.784 to 0.890. All included studies were judged to be at high risk of bias, and seven raised high concerns regarding applicability. There was substantial heterogeneity in predictor selection across studies, with age, income level and family support being the most frequently reported predictors. Although preliminary research on diagnostic prediction models for depression in patients with breast cancer has been undertaken, their methodological quality remains weak. Reporting of external validation and calibration assessment was limited. Current evidence is therefore insufficient to support their routine use in nursing practice. Future research should standardise model development and validation and strengthen the evaluation of model performance. This review suggests that existing diagnostic prediction models for depression in patients with breast cancer are not yet sufficiently robust for routine nursing use, but may provide a reference for future nursing screening research and the optimisation of related tools. This review synthesises the available evidence on diagnostic prediction models for depression in patients with breast cancer and provides a basis for future model development, validation and optimisation. This review was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis: Systematic Reviews and Meta-Analyses (TRIPOD-SRMA) checklist. No patient or public contribution.
- Research Article
- 10.1136/svn-2025-004020
- May 14, 2026
- Stroke and vascular neurology
- Song He + 13 more
Reperfusion therapy, including thrombolysis and thrombectomy, is crucial for ischaemic stroke treatment. However, patient outcomes often remain suboptimal. Conventional regression models show limited accuracy in predicting outcomes after reperfusion therapy. Machine learning predictive models offer potential by integrating multidimensional data. However, their relative advantages over conventional regression models in this context remain uncertain. We aim to compare the performance of conventional regression models, machine learning models in predicting the prognosis of patients undergoing reperfusion therapy. We identified studies using regression or machine learning models to predict outcomes in patients with ischaemic stroke undergoing thrombolysis or thrombectomy. Model performance was summarised as the area under the receiver operating characteristic curve (AUC), with 95% CIs for prediction of modified Rankin Scale (mRS), symptomatic intracranial haemorrhage (sICH) and mortality. Heterogeneity was assessed using Cochran's Q test. Pooled AUCs were calculated. Risk of bias was assessed using Prediction model Risk Of Bias Assessment Tool (PROBAST) and reporting quality was assessed using Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD). In total, 53 studies were included, of which 37 reported AUCs with 95% CI on validation datasets. Pooled analyses were conducted for mRS (n=37), sICH (n=14) and mortality (n=2). Specifically, 24 studies used conventional regression models (pooled AUC 0.80 (95% CI 0.77 to 0.82)), while 13 used machine learning models (0.86 (0.81 to 0.90)). Pooled machine learning model performance showed significant improvement over conventional regression models (p=0.004). Significant differences in model performance were also observed in thrombolysis subgroup (pooled AUC 0.79 (95% CI 0.77 to 0.82) for conventional regression models vs 0.88 (0.80 to 0.96) for machine learning models, p for interaction=0.009). Machine learning models generally outperformed conventional regression models in predicting outcomes after reperfusion therapy, highlighting their potential for prognostic prediction of patients with ischaemic stroke undergoing reperfusion therapy. However, the high risk of bias across studies and limited availability of external validation warrant cautious interpretation of the predictive performance.
- Research Article
- 10.2196/84844
- May 14, 2026
- Journal of medical Internet research
- Ying Gao + 8 more
Machine learning (ML) and deep learning (DL) show promise for fall risk prediction, but prior reviews focused mainly on real-time fall detection, in-hospital falls, or conventional statistical models. The performance of ML-DL-based models for predicting future falls in community-dwelling older adults remains unclear. This study aimed to review ML-DL studies for predicting future falls among community-dwelling older adults and meta-analyze discrimination where feasible. Six databases were searched from inception to September 23, 2024, with updates on August 31, 2025, and February 28, 2026. We included longitudinal studies developing or validating ML-DL models to predict future falls in community-dwelling adults aged ≥60 years and excluded real-time detection, simulated or no fall, and inpatient studies. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Areas under the curve (AUCs) were meta-analyzed using Hartung-Knapp-Sidik-Jonkman random-effects models with 95% CIs. Heterogeneity, 95% prediction intervals (PIs), sensitivity analyses, and subgroup analyses were conducted. After screening 10,253 records, 28 (0.3%) studies were included; 18 (64.3%) focused on general older adults. Prediction horizons ranged from 3 months to 7 years, and fall incidence ranged from 1.6% to 46.6%. Twenty-three (82.1%) studies applied ML, and 5 (17.9%) studies used DL. Input modalities included text (n=18, 64.3%), sensor (n=5, 17.9%), image (n=1, 3.6%), and multimodal data (n=4, 14.3%). Common predictors included age, sex, fall history, depression, and basic daily activities. Only one model underwent external validation. Calibration reporting was sparse. All models were rated at high risk of bias. Ten models were meta-analyzed, yielding a pooled AUC of 0.79 (95% CI 0.69-0.87) with extreme heterogeneity (τ2=0.64; τ=0.80; I2=99.8%; Q=4128.99). The confidence-distribution bootstrap PI was 0.20 to 0.99, indicating substantial uncertainty in expected performance across new populations. Subgroup analyses indicated moderation by sample size and population type, with higher discrimination in specific populations than in general samples; however, the specific population subgroup included only 2 studies. Although all participants were community dwelling, some cohorts were recruited through clinically enriched pathways rather than general community sampling. ML-DL models show potential for identifying community-dwelling older adults at elevated future fall risk; however, wide PIs, limited external validation, and high risk of bias suggest real-world performance may be optimistic. The pooled AUC should be interpreted as a summary of reported discrimination under study-specific conditions, predominantly from internally validated, high-risk-of-bias models, rather than as a robust estimate of transportable real-world performance. This review extends prior reviews by focusing on community-dwelling settings and by integrating PROBAST, Hartung-Knapp-Sidik-Jonkman meta-analysis, PIs, and modality-specific synthesis to evaluate both discrimination and uncertainty. Findings support the use of ML-DL models for proactive fall prevention while emphasizing the need for validation and context-specific implementation.
- Research Article
- 10.1186/s12882-026-05037-2
- May 12, 2026
- BMC nephrology
- Xuanhao Fan + 5 more
Acute kidney injury (AKI) is a common complication following pediatric cardiac surgery, frequently leading to poor outcomes and even death in severe cases. Early prevention remains the primary intervention strategy. Studies have developed prediction models to identify at-risk children at an early stage. This study systematically evaluate existing AKI prediction models to support their clinical utility and future refinement. PubMed, Embase, Web of Science, Cochrane Library, China National Knowledge Infrastructure, Wanfang and SinoMed were searched from inception to 31 December, 2024. The search of references from included studies, as well as the manual search, extended until November 30, 2025. Literature searching, screening, and data extraction were done by two authors. Quality evaluation according to prediction model risk of bias assessment tool (PROBAST). Area under the receiver operating characteristic curve (AUROC) was pooled using a random-effects model to summarize the overall performance of existing models, exploring sources of heterogeneity of performance through subgroup analysis and meta-regression. Sensitivity analysis and Egger's method were used to analyze the stability of the included studies and to identify publication bias. This study was registered with PROSPERO (CRD42024593112) and reported following the Transparent Reporting of Multivariable Prediction Models for Individual Prognosis or Diagnosis: Checklist for Systematic Reviews and Meta-Analysis (TRIPOD-SRMA). A total of 2189 studies were screened which represented the total number of studies retrieved from the database search, the search of references from included studies, and the manual search. Nineteen studies were included in this review. Included studies differed in study design, AKI definition, predictor screening, model development and validation and model performance. The overall pooled AUROC was 0.850 (95% CI, 0.810-0.890), but all studies were evaluated as high risk of bias using the PROBAST. Heterogeneity in model performance was high, and study design and development methods were identified as possible sources of heterogeneity in pooled AUROC. Included studies were stable and free of publication bias. This systematic review suggested that machine learning models for predicting postoperative AKI in pediatric cardiac surgery indicated good discriminative ability. However, the high risk of bias across all included studies and the significant heterogeneity in model performance indicated that the reported performance may be overestimated. The high heterogeneity observed highlights the substantial variability in model performance, which is likely driven by differences in study design and development methods. The clinical utility of these models was currently limited due to the lack of external validation in most studies and the methodological limitations identified. Future research must incorporate rigorous study design, transparent reporting based on the TRIPOD guidelines, and external validation to develop prediction models with clinical utility.
- Research Article
- 10.1038/s41391-026-01107-6
- May 12, 2026
- Prostate cancer and prostatic diseases
- Jianliang Liu + 8 more
There is a recent surge in interest in the application of artificial intelligence (AI) in prostate cancer (PCa). When evaluating PSMA PET scans, AI has shown promise in detecting intraprostatic cancer and metastatic disease. This systematic review aims to assess the ability of AI to detect or predict the progression of PCa during AS. This systematic review was registered on PROSPERO (ID CRD42024529354) and conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines. A comprehensive literature search was performed on Medline, Embase, Web of Science, and IEEE Xplore. Only studies evaluating AI in AS-eligible patients were included. After screening 842 articles, 12 studies were suitable for inclusion. The included studies comprised of 4622 AS patients of whom 1022 experienced progressions. Only three studies utilised developed their AI purely on clinicopathological variables. The area under curve (AUC) of these three AI ranged between 0.65 and 0.76, and the AI algorithm in one study outperformed traditional logistic regression. The integration of MRI parameters particularly the use of radiomics improves the ability of AI to predict progression as compared to clinicopathological variables alone. AI was also able to analyse serial MRI during AS and performs on par with the Prostate Cancer Radiological Estimation of Change in Sequential Evaluation (PRECISE) scoring system. The AUC of the AI algorithms which included MRI parameters ranged between 0.65 and 0.95. One of the limitations was the variability in study methodologies and inclusion criteria. The Prediction Model Risk of Bias Assessment Tool (PROBAST) was used and none of the studies had high risk of bias. AI shows promise in detecting PCa progression during AS. However, this systematic review highlights the need for larger prospective studies with external validation before AI can be integrated into the AS process.
- Research Article
- 10.3760/cma.j.cn112338-20250903-00624
- May 10, 2026
- Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi
- M H Yan + 3 more
The prediction model risk of bias assessment tool (PROBAST) has made it difficult to assess the risk of bias in predictive models. In artificial intelligence (AI), this paper aims to interpret the PROBAST+AI: model development assessment with 16 targeted questions measuring model quality and applicability; and model validation with 18 signalling questions assessing risk of bias and applicability. Both phases cover four domains: study population and data sources, predictor variables, outcomes, and statistical analysis methods. The applicability evaluation was used for the three domains of study population and data sources, predictor variables, and outcomes. This paper helps researchers better understand and apply PROBAST+AI by comparing it with PROBAST 2019, thereby enhancing the transparency, credibility, and scientific value of AI modelling studies.
- Research Article
- 10.1016/j.suronc.2026.102446
- May 8, 2026
- Surgical oncology
- Basma M El-Khalifa + 10 more
Erector spinae plane block (ESPB) versus modified pectoral plane block (PECS II) in managing post modified radical mastectomy pain: A systematic review and meta-analysis.
- Research Article
- 10.1007/s00520-026-10749-0
- May 8, 2026
- Supportive care in cancer : official journal of the Multinational Association of Supportive Care in Cancer
- Chunjian Xu + 5 more
Timely palliative care can reduce the disease burden and improve quality of life in patients with cancer. Although several studies have developed assessment models for palliative care needs in cancer patients, the quality and clinical applicability of these models remain unclear. To systematically review existing assessment models for palliative care needs in patients with cancer, with a focus on their characteristics, predictors, risk of bias, and applicability. A systematic search was conducted in PubMed, Cochrane Library, Embase, Web of Science, CINAHL, Scopus, China National Knowledge Infrastructure (CNKI) through September 12, 2025. Data extraction and evaluation were rigorously performed by two researchers based on the Prediction Model Risk of Bias Assessment Tool (PROBAST). A total of 5714 articles were identified, and eight studies were included, which covered 24 models for assessing palliative care needs. The sample size of the included studies ranged from 179 to 54,628, with areas under the curve ranging from 0.724 to 0.998. The models in all the included studies encompassed four categories of predictive factors: general demographic data, symptom/functional assessments, laboratory indicators, and treatment status. Five studies were rated as having a high risk of bias, primarily due to high risks associated with participants and conclusions, with generally low applicability. Existing models demonstrate potential for identifying patients with cancer who have increased palliative care needs using routinely collected clinical data. Commonly included predictors were symptom burden, functional status, laboratory parameters, treatment-related factors, and demographic characteristics. However, the overall body of evidence is constrained by a substantial risk of bias, particularly arising from inappropriate data sources, limited sample sizes, suboptimal handling of continuous variables, insufficient reporting of missing data, and the lack of robust internal or external validation. In addition, many models adopted mortality-based surrogate outcomes rather than clinically meaningful indicators of palliative care needs. Therefore, the currently available models should be interpreted with caution, and further high-quality model development and external validation are required before they can support broader routine clinical implementation. Future research should prioritize clinically actionable outcomes and incorporate patient-, caregiver-, and family-level factors to improve the relevance of these models for referral decisions and care planning.
- Research Article
- 10.1136/emermed-2025-215673
- May 6, 2026
- Emergency medicine journal : EMJ
- Yahya Al Fathil + 1 more
Chest pain is the second leading emergency department (ED) presentation, with its associated diagnostics requiring ED resource utilisation. Radiography is used in 70% of cases but identifies clinically significant findings in only 1.5%-2.1%. The predominance of non-actionable imaging results, combined with paucity of decision rules, prompted this systematic review to inform the development of a new clinical decision rule (CDR). Four bibliographical databases were searched, including: PubMed, MEDLINE, EMBASE and COCHRANE. Study selection, extraction and quality assessment were conducted independently by two reviewers via Covidence. Studies using a shared clinical decision tool were pooled to calculate sensitivity, specificity, likelihood ratios and false-positive rates using Meta-DiSc V.2.0. Univariate and, where possible, bivariate analyses generated forest plots and summary receiver operating characteristics curves. Heterogeneity was quantified by I², and methodological bias assessed via the Prediction model study Risk of Bias Assessment Tool (PROBAST). From 626 records, 7 studies (6654 ED patients, Canada, Australia, USA) met inclusion. Of these, further analysis was undertaken of four validation studies. Two studies examined the Hess CDR reporting 98.3% sensitivity (95% CI 17% to 100%) and 47.6% specificity (95% CI 43.8% to 51.3%). Two studies examined the Rothrock CDR and reported 88.6% sensitivity (95% CI 80.1% to 93.7%) and 73% specificity (95% CI 17.7% to 97.2%). Hess had a negative likelihood ratio of 0.04 (95% CI 0 to 9.17) compared with Rothrock (0.156, 95% CI 0.06 to 0.38) and Rothrock had a positive likelihood ratio of 3.3 (95% CI 0.52 to 20.95) compared with Hess (1.9, 95% CI 1.67 to 2.11). Meta-analysis showed high heterogeneity with low bias as per PROBAST criteria. A systematic review and meta-analysis of two chest X-ray decision rules for non-traumatic chest pain found the Hess et al rule more sensitive but unlikely to reduce imaging. Evidence is limited by few studies, high heterogeneity and retrospective cohorts. Neither rule is recommendable, highlighting the need for prospective derivation using established methodological standards.
- Research Article
- 10.2196/91659
- May 6, 2026
- Journal of medical Internet research
- Shewen Lyu + 4 more
Deep learning (DL) algorithms for digital breast tomosynthesis (DBT) have proliferated, demonstrating emerging potential in enhancing lesion detection and classification. This study aimed to compare the diagnostic performance of DL algorithms for DBT with that of radiologists of varying experience and assess the clinical impact of DL assistance. A systematic search of PubMed, Embase, Web of Science, and the Cochrane Library was conducted up to November 8, 2025. Included studies compared the performance of stand-alone DL algorithms for DBT, radiologist interpretation alone, and DL-assisted diagnosis. Study quality was assessed using the Prediction Model Risk of Bias Assessment Tool+Artificial Intelligence (PROBAST+AI). Performance metrics were pooled using bivariate random effects and generalized linear mixed models. A total of 13 studies with 38,565 patients were included in the final analysis. Stand-alone DL algorithms achieved a pooled sensitivity of 0.88 (95% CI 0.80-0.93), specificity of 0.74 (95% CI 0.59-0.85), and area under the receiver operating characteristic curve (AUC) of 0.89 (95% CI 0.86-0.92). While DL performance showed no statistically significant difference compared to all radiologists (AUC=0.89 vs 0.88; P=.64) or senior radiologists (AUC=0.89 vs 0.90; P=.48), DL demonstrated significantly superior sensitivity compared to junior radiologists (0.88 vs 0.76; P=.03). Notably, DL assistance did not statistically improve diagnostic metrics for radiologists across any experience level. Meta-regression identified validation methods as a significant source of heterogeneity. DL algorithms for DBT exhibited strong diagnostic proficiency and showed higher sensitivity than junior radiologists, suggesting their potential utility as adjunctive tools to help reduce oversight in less experienced settings. However, given that DL assistance did not significantly elevate overall human performance, current models act primarily as supplementary aids rather than definitive clinical tools. Future prospective multimodal studies are warranted to validate these findings and optimize clinical integration.
- Research Article
- 10.1007/s12630-026-03114-6
- May 5, 2026
- Canadian journal of anaesthesia = Journal canadien d'anesthesie
- Alisia Chen + 6 more
Current management of type 2 diabetes mellitus and obesity increasingly includes treatment with glucagon-like peptide-1 receptor agonists (GLP-1 RAs). In this systematic review and meta-analysis, we sought to characterize the effect of GLP-1 RAs on gastric emptying half-time (T½). We conducted a systematic review of prospective studies reporting T½ with and without GLP-1 RA treatment. Inclusion criteria were 1) patients aged ≥ 18 yr taking a GLP-1 RA for diabetes mellitus and/or weight loss, 2) gastric emptying assessment reported as emptying T½, and 3) study design that was a randomized controlled trial or a prospective cohort study. We searched the following databases: MEDLINE, MEDLINE In-Process/ePubs, Embase, Cochrane Central Register of Controlled Trials, American Psychological Association (APA) PsycInfo®, and CINAHL. We assessed the quality of the studies with the risk of bias assessment tool for randomized controlled trials and the Risk of Bias in Non-randomized Studies-of Interventions (ROBINS-I) tool for nonrandomized trials. We used the Grading of Recommendations Assessment, Development and Evaluation (GRADE) classification for assessment of the certainty of the evidence. We performed multilevel random effects meta-analysis to pool standardized mean differences. We included 10 studies (N = 300), one of which reported two independent samples; these were treated as two independent studies for the purpose of meta-analysis. Glucagon-like peptide-1 receptor agonists significantly increased gastric emptying T½, with a large effect size (standardized mean difference, 2.38; 95% confidence interval [CI], 1.05 to 3.71; P < 0.001) corresponding to a mean prolongation of 74 min (95% CI, 46 to 101). There was a trend towards a more pronounced effect with short-acting GLP-1 RAs, with a standardized mean difference of 3.86 (95% CI, 2.37 to 5.35) corresponding to a prolongation of 116 min (95% CI, 71 to 161) and early treatment phases (< 10 weeks) with a standardized mean difference of 2.72 (95% CI, 1.15 to 4.35) with a mean prolongation of 82 min (95% CI, 35 to 131). Nevertheless, the certainty of the effect size following the GRADE classification was "very low." In this systematic review and meta-analysis, we found that GLP-1 RAs significantly prolonged gastric emptying T½ by a mean of 74 min, which could have implications for perioperative care. There was a trend towards a more pronounced effect with short-acting (vs long-acting) drugs and in the early treatment phases (< 10 weeks). PROSPERO ( CRD42023461665 ); first submitted 8 September 2023.
- Research Article
- 10.3389/fimmu.2026.1702830
- May 5, 2026
- Frontiers in Immunology
- Si-Qi Zhu + 1 more
Objective This systematic review and meta-analysis aims to evaluate the effects of Tai Chi Chuan on the expression of pro-inflammatory genes IL-6, IL-1β, and TNF-α—downstream of the NF-κB pathway—in adults with chronic diseases. It further explores potential anti-inflammatory mechanisms and identifies research gaps in the literature regarding these mechanisms. Methods This study searched seven electronic databases for relevant literature, with language restrictions limited to English and Chinese. The risk of bias in all included trials was assessed using the Cochrane Risk of Bias Assessment Tool (version 2.0) and the GRADE (Grading of Recommendations, Assessment, Development and Evaluation) system. Standardized mean differences (SMD) with 95% confidence intervals (CI) were used to evaluate pooled effect sizes. P values &lt; 0.05 were considered statistically significant. Subgroup analyses were conducted according to disease systems. Results We retrieved a total of 1,110 relevant studies, with 20 studies ultimately included in the analysis. These covered diseases across multiple systems, including oncology, endocrinology, respiratory, and neurological disorders. To more directly reflect intervention effects, we extracted the mean ± standard deviation of change values post-intervention compared to baseline as the analysis data. After assessing publication bias and minimizing heterogeneity effects, we found that Tai Chi Chuan significantly reduced the expression of downstream target genes (SMD = -0.48, 95% CI: -0.76 to -0.19, p &lt; 0.01), significantly down-regulated IL-6 (SMD = -0.66, 95% CI: -1.27 to -0.06, p = 0.03), and IL-1β (SMD = -0.59, 95% CI: -0.95 to -0.23, p &lt; 0.01). while TNF-α showed a downward trend but without statistical significance (SMD = -0.28, 95% CI: -0.59 to 0.02, p = 0.07). Subgroup analysis revealed that patients with endocrine and respiratory system diseases derived the most significant benefit. Conclusion Tai Chi can alleviate systemic inflammation in patients with chronic diseases by suppressing NF-κB-driven pro-inflammatory gene expression, demonstrating both safety and feasibility. Furthermore, we identified gaps in existing research on Tai Chi and NF-κB, particularly the lack of randomized controlled trials (RCTs). Future studies should conduct RCTs with NF-κB core proteins and factors as direct outcome measures to directly elucidate Tai Chi’s regulatory effects on the NF-κB pathway. Systematic review registration https://www.crd.york.ac.uk/PROSPERO/ , identifier CRD420251112908.
- Research Article
- 10.1002/epi.70274
- May 5, 2026
- Epilepsia
- Alyssa A Federico + 5 more
Prediction models are increasingly being sought in epilepsy surgery to predict postoperative outcomes and support clinical decision-making. Studies summarizing the evidence in this area can provide insight into the type of surgical prediction models, their methodology, and their performance and inform areas for future research. Our aim was to address these knowledge gaps through a comprehensive systematic review of prediction models in epilepsy surgery. A systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines using four databases. Papers were included if they were primary research studies, human-based, studied adult or pediatric populations, studied people with epilepsy undergoing surgical management, and developed or validated a multivariable tool to predict epilepsy surgery outcomes. Data extraction was reviewed in triplicate, and the quality of evidence in each paper was assessed using the Prediction Model Risk of Bias Assessment Tool. The literature search yielded a total of 11 614 papers, with 42 papers and 113 prediction models included in the final analysis. The median area under the curve and accuracy for all models were .75 (interquartile range = .68-.83) and .76 (interquartile range = .69-.83), respectively. Overall, 54.0% of models underwent internal validation, and 20.4% underwent external validation. Models of cognitive-language outcomes seemed to perform better than those for other outcomes. Overall risk of bias was high in 81% of models, with weakest performance in outcomes and analyses, but trended toward improvement over time. Concerns for applicability were low in 89% of the models. Prediction models in epilepsy surgery are rapidly proliferating, but most lack external validation, and many still exhibit a high risk of bias. Therefore, caution is needed when interpreting and applying these predictive tools. Evidence of improvement in methodological quality holds promise for enhancing patient care, if coupled with improved model performance.
- Research Article
- 10.1016/j.bneo.2026.100217
- May 1, 2026
- Blood neoplasia
- Xiaoyi Zhang + 8 more
Machine learning and deep learning tools have been proposed to improve survival prediction in acute myeloid leukemia (AML), but comparative benchmarks remain unclear. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 searches of PubMed, Scopus, and Web of Science (January 2018 to March 2025) identified studies developing or externally validating artificial intelligence (AI)-based models for overall or relapse-free survival reporting area under the receiver operating characteristic (ROC) area under the curve (AUC). Two reviewers extracted design, population, features, algorithms, and training/validation AUCs and assessed risk of bias using Prediction model Risk of Bias Assessment Tool (PROBAST). Random-effects meta-analysis (DerSimonian-Laird) pooled validation AUCs overall and by horizon (1/2/3/5 years) and feature category (gene-centric vs nongenetic). Optimism bias was the training-validation AUC difference. We included 24 predominantly retrospective studies (137 model cohorts; ∼51 055 patients). Of 120 PROBAST domain ratings, 74% were low risk, 25% unclear, and <1% high; statistical analysis was the weakest domain. Across 73 independent validation cohorts, the pooled AUC was 0.769 (95% confidence interval [CI], 0.742-0.795) with substantial between-study variability (I 2 = 95.7%; meaning most of the spread reflects real differences across cohorts rather than chance). Validation AUCs increased with longer horizons (1-year, 0.748; 2-year, 0.760; 3-year, 0.760; 5-year, 0.833). Pooled development AUC was 0.801 vs 0.749 in matched validation sets (ΔAUC, 0.052; 95% CI, 0.041-0.063). Nongenetic models achieved a pooled validation AUC of 0.776 vs 0.741 for gene-centric models (ΔAUC, 0.035; P = .085). AML AI prognostic models show moderate discrimination with modest optimism but substantial heterogeneity and limited prospective validation, supporting standardized reporting and rigorous external evaluation.
- Research Article
1
- 10.1016/j.jad.2026.121255
- May 1, 2026
- Journal of affective disorders
- Sophie J Fairweather + 6 more
Prediction of atypical health trajectories may enable early intervention. We systematically reviewed the existing literature on models for predicting longitudinal depression and/or anxiety trajectories. MEDLINE, Embase and APA PsycINFO were searched (from inception to 31-Jan-2025). We included population-based studies of children and adults (aged 3-65years). Risk of bias was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST-AI) tool. Seven of the nine included studies were in adult populations with a diagnosis of depression or anxiety at baseline; two focused on child and adolescent populations. Only one study included anxiety trajectories. Identified trajectories typically comprised three to four groups including: chronic/persistent-high, stable-low, increasing/worsening, and improved/remitted groups. Various supervised predictive modelling methods were used. The number of final predictors included in models ranged from three to 152. Family and own/personal psychiatric history were the most common predictors but were not always important for model performance. Models including more predictors did not always perform better. Overall risk of bias was high in all studies. No studies were externally validated and no studies assessed the clinical utility of models. This review highlights a need for robust, validated models that can forecast future risk of persistent or worsening anxiety and depression, especially in young people where early intervention is possible.
- Research Article
- 10.1016/j.neubiorev.2026.106582
- May 1, 2026
- Neuroscience and biobehavioral reviews
- Charlie W Mcdonald + 3 more
"I'm not here, this isn't happening": Interoception and its role in dissociation - A systematic review.
- Research Article
- 10.1016/j.oret.2026.04.022
- May 1, 2026
- Ophthalmology. Retina
- Abdullah Al-Ani + 9 more
Artificial Intelligence-Based Prognostic Models for Postoperative Outcomes in Vitreoretinal Surgery: A Systematic Review and Meta-Analysis.
- Research Article
- 10.1002/hsr2.72205
- May 1, 2026
- Health science reports
- Seyedeh Narjes Ahmadizadeh + 6 more
The accurate prediction of mortality risk among critically ill children represents a major challenge in Pediatric Intensive Care Units (PICUs). Artificial Intelligence (AI) can find intricate patterns linked to mortality risk by examining large volumes of clinical data. This can help physicians better predict mortality risk and provide more efficient care. A systematic review can summarize the current evidence regarding the role of AI in predicting mortality among critically ill children in the PICU. A comprehensive and systematic search was conducted across three major electronic databases: PubMed, Scopus and Web of Science. The database searches were conducted on December 2, 2024. The screening process was conducted in two stages: title and abstract screening, and full-text review. This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The Prediction model study Risk Of Bias Assessment Tool (PROBAST) was used to assess the risk of bias and concerns regarding the applicability of the included studies. Ultimately, the full text of 17 articles was reviewed for study evaluation and data extraction. 76% of the relevant research was conducted in 2020 or later, while 24% of the articles were published before 2020. The most frequently used algorithm in the studies was random forest. Overall, the Area Under the Receiver Operating Characteristic (AUROC) was greater than 0.8 in 88% of the studies and less than 0.8 in 12% of the studies. The studies emphasize the crucial role of machine learning and deep learning in improving mortality prediction in PICUs. The variability in AUROC values between different methods shows that while certain models excel in certain contexts, the choice of algorithm and feature selection significantly affect prediction accuracy.
- Research Article
- 10.1001/jamaoncol.2026.1023
- Apr 30, 2026
- JAMA Oncology
- Marcin Miszczyk + 25 more
Standard-of-care management of radiorecurrent prostate cancer (PCa) involves systemic therapy; however, some patients seek to avoid the adverse events (AEs) that are associated with androgen-deprivation therapy (ADT). To determine outcomes of local therapy without systemic therapy for radiorecurrent PCa. MEDLINE, Embase, Web of Science Core Collection, and Google Scholar were searched from inception up to May 2025. No date or language filters were used. Data were analyzed from June to November 2025. Prospective and retrospective studies were selected that investigated local salvage therapies without concomitant systemic treatment for locally recurrent PCa after definitive radiotherapy. Eligible studies provided ADT-free survival (ADT-FS) and/or metastasis-free survival (MFS). Authors were contacted for additional data. This study was prospectively registered and adhered to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Risk of bias was assessed using Risk of Bias Assessment Tool for Nonrandomized Studies, version 2. Individual patient data were reconstructed from Kaplan-Meier curves or retrieved from authors. ADT-FS, MFS, and rates of AEs were pooled in a random-effects models. Main outcomes were ADT-FS and MFS, which were modeled as pooled summary Kaplan-Meier curves, and rates of severe or worse AEs, which were modeled as proportions. Outcomes were stratified by treatment method. Thirty-one studies (4525 patients) were identified that assessed salvage high-dose-rate brachytherapy (HDR-BT; 336 patients), low-dose-rate brachytherapy (LDR-BT; 92 patients), stereotactic body radiotherapy (SBRT; 213 patients), radical prostatectomy (sRP; 1476 patients), cryotherapy (1621 patients), high-intensity focused ultrasonography (HIFU; 677 patients), or mixed methods (110 patients). Prospective studies comprised approximately one-fourth of the evidence (1055 patients); however, none were identified for sRP. Pooled 2-year and 5-year ADT-FS (2887 patients) were 76.8% and 55.2%, respectively. Pooled 2-year and 5-year MFS (3425 patients) were 90.4% and 75.2%, respectively. Rates of severe or worse AEs (2308 patients) ranged from 14% for LDR-BT, 13% for sRP, 5% for HDR-BT, 5% for HIFU, 4% for SBRT, and 2% for cryotherapy. Risk of bias concerns primarily regarded patient selection. Limitations included a lack of randomized clinical trials. The findings of this systematic review and meta-analysis suggest that local therapies alone have reasonable efficacy in well-selected patients with locally radiorecurrent PCa. ADT-free survival was maintained for more than three-quarters of patients at 2 years and more than half at 5 years. Approximately one in ten experience an early metastatic event. Rates of severe toxic effects were manageable, in particular for salvage HDR-BT, HIFU, SBRT, and cryotherapy.
- Research Article
- 10.2196/82482
- Apr 30, 2026
- JMIR Research Protocols
- James W Navalta + 2 more
BackgroundResting metabolic rate (RMR) prediction equations used today often rely on the consideration of binary sex. Significant intrasex variability and a lack of data on diverse populations raise concerns about these equations’ validity and generalizability. Existing systematic reviews have focused on specific populations like individuals with obesity or athletes, but none have systematically examined the demographic characteristics of participants used to derive these equations. Our central hypothesis is that the accuracy of RMR prediction is influenced by the demographic alignment between the equation’s derivation population and the individual. We present a systematic review protocol to critically evaluate the literature and participant demographic profiles that underpin current RMR prediction equations.ObjectiveOur objectives are to (1) determine the characteristics of participant populations, including reporting on gender and sex diversity, used in RMR equation research; (2) critically appraise the methodologies, findings, and reporting practices of studies that developed RMR equations for binary populations; and (3) use the Sex and Gender Equity in Research guidelines to assess sex and gender terminology and variable inclusion in the generative RMR prediction literature.MethodsFollowing a PROSPERO-registered protocol (CRD420251084400), we will conduct a comprehensive search across multiple databases, including Academic Search Premier, PubMed, and Web of Science. The final search string will be: ((resting metab* rate) OR (RMR) OR (basal metab* rate) OR (BMR) OR (metabol*) OR (resting energy expenditure) OR (metab* rate)) AND ((predict* equation) OR (predict* model) OR (predict* algorithm) OR (formula) OR (estimation equation)) AND ((demograph*) OR (characterist*) OR (age) OR (race) OR (ethnicity) OR (sex) OR (gender)). We will include peer-reviewed, English-language articles reporting studies that generated RMR prediction equations and reported human participant demographic characteristics. Exclusion criteria include studies not generating prediction equations, without demographic data, or involving animals. Data extraction will include reported participant demographics (eg, sex, gender, race or ethnicity, age, and body composition), RMR test protocols, and reported reliability or validity metrics. Risk of bias will be assessed using PROBAST (Prediction Model Risk of Bias Assessment Tool).ResultsThis study was funded in June 2025 by the University of Nevada, Las Vegas Sports Innovation Initiative Catalyst Grant Funding Program and in July 2025 by the National Association for Kinesiology in Higher Education Hellison Interdisciplinary Research Grant. The databases were searched using the final search string between August 1, 2025, and August 8, 2025. Training of team members began on September 3, 2025, and concluded on October 20, 2025.ConclusionsFindings will be disseminated through a narrative synthesis submitted for publication, adhering to the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analyses) reporting guidelines. This review will identify gaps in the inclusivity and generalizability of current RMR prediction equations, informing future research and clinical applications.Trial RegistrationPROSPERO CRD420251084400; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251084400International Registered Report Identifier (IRRID)PRR1-10.2196/82482