Articles published on Comparative effectiveness research
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
2631 Search results
Sort by Recency
- New
- Research Article
- 10.2196/77380
- Dec 31, 2025
- JMIR Cardio
- Yijun Liu + 5 more
BackgroundAtrial fibrillation (AF) ablation is an effective treatment for reducing episodes and improving quality of life in patients with AF. However, long-term AF-free rates after AF ablation are inconsistent across the population, ranging from 50% to 75%. Patient selection relies on individual clinical assessment, highlighting a critical gap in population-level predictive analytics. While existing risk scores (eg, CHADS₂ [congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, and stroke], CHA₂DS₂-VASc [congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, stroke, vascular disease, age, and sex category], CAAP-AF [coronary artery disease, left atrial diameter, age, AF, antiarrhythmic drugs, and female sex category]) have been applied to predict AF ablation outcomes, their performance in administrative claims data remains unclear. Leveraging large administrative claims databases represents an opportunity to develop standardized, scalable prediction models that could inform population health management and resource allocation at a national level.ObjectiveThis study utilizes machine learning (ML) models on claims data to explore if integrating International Classification of Diseases (ICD) billing codes outperforms traditional stroke and AF risk scores in predicting 1-year AF ablation outcomes.MethodsWe analyzed claims data from the Merative MarketScan Research Medicare database (2013‐2020) to identify 14,521 patients who underwent AF ablation. To predict 1-year AF-free outcomes, we developed logistic regression and extreme gradient boosting (XGBoost) models using demographic characteristics, comorbidity indices, and ICD diagnostic codes from the 2 years preceding ablation. Model predictions were compared with claims-based implementations of established risk scores—CHADS2, CHA2DS2-VASc, and a modified CAAP-AF (without left atrial diameter and the number of failed antiarrhythmic drugs). The ML models were also assessed on subgroups of patients with paroxysmal AF, persistent AF, and both AF and atrial flutter from October 2015 onward.ResultsAmong 14,521 patients (mean age 71.5, SD 5.31 y; n=5800, 39.94% female), AF ablation success occurred in 54.01% (n=7843). XGBoost achieved areas under the receiver operating characteristic curve (AUCs) of 0.528, 0.521, and 0.529 for the whole, female, and male AF ablation groups, respectively, and better discrimination than CHADS2, CHA2DS2-VASc, and the modified CAAP-AF in all AF ablation groups (whole population, female, and male). While CHA2DS2-VASc and the modified CAAP-AF showed higher recall (>0.798), their precision (<0.540) was lower than XGBoost (0.552‐0.556). In subgroup analyses of International Classification of Disease, Tenth Revision (ICD-10) patients (n=7646), the models incorporating ICD codes demonstrated better performance than those using only demographic and comorbidity data across most AF subtypes, with the highest AUC (0.544) observed in patients with paroxysmal AF.ConclusionsWhile the ML models achieved statistically significant improvements over claim-based implementations of established clinical risk scores (AUC 0.528‐0.544 vs 0.498‐0.505), the modest predictive performance highlights challenges in predicting procedural outcomes using administrative data that lack key clinical variables (eg, left atrial size and medication details). Our findings establish that while standardized outcome prediction using nationally available administrative data is technically feasible, current performance is insufficient for clinical decision-making and better suited for health system quality monitoring and comparative effectiveness research applications.
- New
- Research Article
- 10.1186/s12874-025-02741-9
- Dec 26, 2025
- BMC medical research methodology
- Hong Xiong + 3 more
Comparative effectiveness research with average hazard for censored time-to-event outcomes: simulation study and application to observational data.
- Research Article
- 10.1097/ta.0000000000004855
- Dec 17, 2025
- The journal of trauma and acute care surgery
- Susan Kartiko + 13 more
Chest wall injury (CWI) occurs in 10% to 15% of trauma admissions and is associated with significant short- and long-term morbidity. Despite recent advances in management, critical knowledge gaps remain. This study sought to identify consensus-based research priorities for CWI established by the National Trauma Research Action Plan (NTRAP). This study is a secondary analysis of consensus-based research priorities collected using an online Delphi survey methodology by 11 NTRAP panels, each focused on different domains across the entire spectrum of trauma care. The database of research questions or gaps was queried for the key words "Chest Wall/Rib," "Rib Fracture/Pain Management," and "Rib Fracture/Pulmonary Management." Fifty-seven CWI-related research questions were identified across seven NTRAP panels. Of these, 15 (26%) were rated as high priority and 42 (74%) as medium priority. Most CWI-related research questions appeared in the following topics: Chest Wall/Ribs (n = 22), followed by Blocks/Regional Anesthesia: Effects on Pain (acute/chronic, hemodynamics, inflammation) (n = 5) and Special Populations: Long-term Outcomes after Trauma in Older Adults; Functional Recovery and Mortality (n = 3). Eighteen questions specifically addressed surgical rib fixation. National Trauma Research Action Plan identified 57 consensus-driven research priorities in CWI. These findings should inform extramural funding efforts, focusing on studies that evaluate short-term clinical metrics, comparative effectiveness research between surgical and nonsurgical management, and the long-term impact of CWI on patient recovery and quality of life. Expert Opinion/Consensus; Level V.
- Research Article
- 10.1097/sla.0000000000007002
- Dec 17, 2025
- Annals of surgery
- Nicolò Pecorelli + 23 more
To assess the reliability and construct validity of the CCI®️ following pancreatic surgery. The Comprehensive Complication Index (CCI®️) is the only validated metric that quantifies cumulative morbidity, with a continuous score ranging from 0 (no complications) to 100 (death). To address construct validity, we assessed patients undergoing elective pancreatic surgery for any disease at five Italian centers enrolled in a randomized controlled trial (NCT04438447) and a prospective cohort study (NCT04431076). The severity of 90-day complications was assessed using the CCI®️. We tested 10 a priori construct validity hypotheses through linear regression. Regression coefficients represented the between-group mean difference in CCI®️, with an effect size ≥0.2 considered potentially meaningful. Validity was deemed adequate if >75% of the hypotheses were supported. To address reliability, three independent raters among six centers assessed the CCI®️ from 100 anonymous case vignettes to evaluate inter-rater and inter-center reliability through intraclass correlation coefficient (ICC) and standard error of measurement (SEM). 797 patients were included (66±11y, 50% female, 60% malignancy). The construct validity was supported by data, with 9/10 a priori hypotheses confirmed (90%). The CCI®️ showed excellent inter-rater (ICC=0.96, 95%CI: 0.95-0.97), high inter-center reliability (ICC >0.75 in each center), with a SEM ranging from 2.73 to 6.38. This study supports CCI®️as a valid and reliable measure of morbidity after pancreatic surgery, supporting its use in both clinical practice and comparative effectiveness research.
- Research Article
- Dec 16, 2025
- Alternative therapies in health and medicine
- Aqsa Saman + 6 more
Increasing evidence suggests the effectiveness of virtual reality (VR)-based neuro-rehabilitation. However, the evidence is not well defined, specifically for progressive neurological disorders. This study aimed to determine the efficacy and safety of VR therapy over conventional therapy in treating progressive neurological disorders in adults. The study comprises a systematic review and a meta-analysis, following the PRISMA guidelines, and was registered with PROSPERO (CRD42024582827). Relevant literature was searched in electronic databases including Scopus, Web of Science, PEDro, PubMed, Cochrane Library, and Google Scholar. Seven articles were meticulously selected after eliminating irrelevant ones based on inclusion and exclusion criteria. The PEDro scale was used to assess the methodological rigor of the selected studies. The risk of bias was evaluated using the Cochrane's Risk of Bias 2 tool. Within an academic research context, the published studies from the databases were used for this study. No participants were directly recruited; this review included participants reported in the included studies. VR therapies (non-immersive or semi-immersive) were compared with conventional therapy as reported in the original studies. Measuring motor rehabilitation of upper or lower limbs, balance, quality of life (QoL), and adverse effects. Both groups demonstrated improvement in analyzed parameters (e.g., motor functions, balance, and QoL). No difference was found in motor function measures between groups. The QoL measures insignificantly favored the VR group, while the balance measures significantly favored conventional therapy. Moreover, VR therapy was not significantly linked with adverse effects, except for some minor reactions. Non-immersive or semi-immersive VR was at least on par with conventional therapy for assessed outcome measures, except for the balance measures, which significantly favored conventional therapy. virtual reality, progressive neurological disorders, neurodegenerative diseases, neurological rehabilitation, systematic review, comparative effectiveness research.
- Research Article
- 10.1111/add.70288
- Dec 14, 2025
- Addiction (Abingdon, England)
- Payel J Roy + 7 more
Comparative effectiveness research studies commonly restrict cohorts to individuals who initiate a medication and do not have evidence of prior treatment. This is particularly challenging in research on medications for opioid use disorder (MOUD) because of sporadic use or intermittent adherence. We examined the impact of different lookback windows and washout criteria to identify MOUD initiator cohorts on sample size, cohort characteristics, and misclassification of treatment initiation. Cohort study using the Merative™ MarketScan® Multi-State Medicaid Database (2011-2022). Medicaid-insured adults aged 18-64 with an MOUD prescription from 01/01/2022 to 12/31/2022 and a history of opioid use disorder (OUD) with at least 3 months of continuous enrollment. We created treatment initiator cohorts with increasingly restrictive lookback windows for inclusion (6-, 12-, 24-, 36-months, all-available). During each lookback window, we required [1] continuous enrollment; [2] continuous enrollment and OUD diagnosis; or [3] continuous enrollment, OUD diagnosis, and no prior treatment with MOUD. We defined prior treatment with MOUD as: (a) ≥ 30 days use (less restrictive definition; allowed for some prior treatment); or (b) ≥1day use (more restrictive definition; did not allow prior treatment). We quantified changes in cohort sample size, demographic characteristics, and proportion of prevalent use episodes misclassified as MOUD treatment initiation (gold standard: 36-month lookback window). We identified 103 794 eligible MOUD initiators (64.8% buprenorphine, 24.8% methadone, 10.4% naltrexone). Sample size of the cohorts decreased with increasingly restrictive lookback windows and washout criteria: [1] continuous enrollment (range, 96.9% for 6months to 51.8% for 36 months); [2] continuous enrollment and less restrictive washout (range, 29.7% to 8.4%); and [3] continuous enrollment and more restrictive washout (range, 22.2% to 5.8%). All-available lookback performed similarly to a 12-month lookback. Longer lookback windows resulted in initiator cohorts with a greater proportion of individuals who were older, female, and of a minoritized race/ethnicity. The proportion of people with prevalent MOUD use misclassified as treatment initiation increased steadily with decreasing duration of lookback windows (24-, 12-, and 6-month); we observed misclassification among 16.1% to 49.2% of individuals (less restrictive washout), and 16.8% to 53.2% of individuals (more restrictive washout). The choice of lookback window duration and washout criteria in research on medications for opioid use disorder (MOUD) presents tradeoffs between cohort sample size, demographic characteristics, and misclassification of treatment initiation. This study offers practical guidance for researchers planning to perform comparative studies in MOUD.
- Research Article
- 10.57264/cer-2025-0126
- Dec 9, 2025
- Journal of Comparative Effectiveness Research
- Andre Verhoek + 4 more
Aim:Fractional Polynomial (FP) models are widely used in survival analysis for health technology assessment and network meta-analysis (NMA). However, current implementations rely on a fixed set of pre-specified powers, which may constrain model flexibility, limit predictive performance and increase computational cost in Bayesian settings. This study introduces and evaluates a Bayesian FP modeling approach in which the powers are estimated as continuous parameters rather than fixed, aiming to simplify model selection and improve fit.Materials & methods:Second-order Bayesian FP models were implemented in STAN, allowing the time transformation powers (p1, p2) to be estimated from the data. Model performance was evaluated across three oncology NMA datasets; in advanced non-small-cell lung cancer, metastatic prostate cancer and early breast cancer. The performance was assessed using visual fit, leave-one-out-information-criteria, root mean square error, incremental survival estimates and computational efficiency. Validation steps included posterior predictive checks, sensitivity analyses and long-term extrapolation.Results:Across all datasets, variable power models consistently achieved better statistical fit (lower leave-one-out-information-criteria and root mean square error) than fixed power models. Incremental survival estimates were also more stable and clinically plausible, particularly in datasets with complex hazard dynamics. While variable models required slightly more time per run, the approach greatly reduced the number of required model configurations, leading to lower overall computational burden.Conclusion:Bayesian FP models with variable powers not only improve model fit and simplify model selection but also reduce structural uncertainty by replacing exhaustive grid searches with a unified, data-driven estimation of transformation powers, while retaining interpretability and computational efficiency. By producing robust, well-calibrated survival projections and streamlining model selection, this approach strengthens survival analysis for health technology assessment and supports more reliable decision-making in comparative effectiveness research.
- Research Article
- 10.1016/j.ejca.2025.116093
- Dec 1, 2025
- European journal of cancer (Oxford, England : 1990)
- Steven G Dubois + 35 more
Paediatric Strategy Forum for medicinal product development of agents targeting GD2 ganglioside in children and adolescents with cancer.
- Research Article
- 10.1111/petr.70242
- Dec 1, 2025
- Pediatric transplantation
- Alejandro Costaguta + 1 more
Towards Best Practice Development in Pediatric Liver Transplant Immunosuppression Management Through Comparative Effectiveness Research: The Scylla and Charybdis Dilemma.
- Research Article
2
- 10.1016/j.jtct.2025.08.027
- Dec 1, 2025
- Transplantation and cellular therapy
- Jaime M Preussler + 16 more
Proceedings From the Second Reimagining Caregiver Workshop: Addressing Caregiver Requirements for Hematopoietic Cell Transplant.
- Research Article
- 10.1016/j.clgc.2025.102435
- Dec 1, 2025
- Clinical genitourinary cancer
- Avani P Desai + 12 more
Data Sources for Clinical T1 Renal Masses and the Potential for Bias.
- Research Article
- 10.1016/j.jad.2025.119836
- Dec 1, 2025
- Journal of affective disorders
- Clotilde Guidetti + 23 more
Effect of augmentation with aripiprazole or augmentation with repetitive transcranial magnetic stimulation versus switching to the antidepressant venlafaxine extended release/duloxetine on cognition: A comparative effectiveness research trial for antidepressant incomplete and non-responders with treatment-resistant depression (ASCERTAIN-TRD).
- Research Article
- 10.3390/jcm14238506
- Nov 30, 2025
- Journal of clinical medicine
- Dimitris Baroutis + 9 more
Background/Objectives: Cervical insufficiency affects 1-2% of pregnancies and represents a significant cause of second-trimester loss and spontaneous preterm birth. This review synthesizes current evidence across the clinical spectrum of cervical insufficiency, providing evidence-based management guidance and identifying areas requiring further investigation. Methods: We conducted a comprehensive review of the current literature, evidence-based clinical guidelines, and landmark randomized controlled trials examining diagnostic frameworks, therapeutic interventions, and clinical outcomes across different presentations of cervical insufficiency. Our analysis incorporated data from major obstetric databases, professional society recommendations, and recent comparative effectiveness research. Results: Cervical insufficiency diagnosis encompasses three primary categories: history-based, ultrasound-based, and physical examination-based. Vaginal progesterone achieves a 31% reduction in preterm birth before 33 weeks (RR 0.69, 95% CI 0.55-0.88; NNT= 14). Ultrasound-indicated cerclage achieves a 30% relative risk reduction for delivery <35 weeks. The landmark SuPPoRT trial (n = 386) demonstrated no statistically significant differences among cerclage, pessary, and progesterone (p = 0.4), though formal equivalence trials have not been conducted. Multiple gestations show no benefit from singleton-derived interventions (RR 0.99-1.04). Conclusions: Optimal cervical insufficiency management emphasizes individualized approaches based on comprehensive risk stratification and objective cervical assessment, with vaginal progesterone and cervical cerclage serving as cornerstone therapies supported by robust clinical evidence.
- Research Article
- 10.1097/ncq.0000000000000931
- Nov 13, 2025
- Journal of Nursing Care Quality
- Adeola Areo + 4 more
Background: Comparative effectiveness research (CER) provides evidence regarding which treatment may be most effective; however, there is a need to ensure CER findings are implemented in clinical practice. As such, CER implementation training was provided to advanced practice registered nurses (APRNs). Purpose: The purpose of this project was to evaluate the impact of implementation training on self-efficacy and intention to implement CER findings among APRNs. Methods: A descriptive cross-sectional study design was used. Results: A total of 18 APRNs participated between 2 training sessions. Overall, participants were satisfied with the training. Most (91%) rated their self-efficacy to implement CER findings as at least fair, and 82% intended to use CER findings in practice. Conclusion: It is imperative that nursing leaders be provided implementation training that promotes innovation in health care delivery, thereby decreasing the research-to-practice gap.
- Research Article
- 10.1001/jamaophthalmol.2025.4495
- Nov 13, 2025
- JAMA Ophthalmology
- Peter R Kastl + 2 more
This comparative effectiveness research study calculates the distances patients must travel to reach ophthalmologists in the US.
- Research Article
- 10.1200/op-25-00099
- Nov 7, 2025
- JCO oncology practice
- Jeddeo M Paul + 3 more
Multiple checkpoint inhibitors are approved for cancer treatment in the United States, accounting for nearly 20% of Medicare Part B spending. To explore why prices remain high, this study analyzed pricing trends for checkpoint inhibitors from 2015 to 2024 and the degree of overlap of their US Food and Drug Administration (FDA)-approved indications. For 11 FDA-approved checkpoint inhibitors, we studied quarterly average sales prices from public Medicare spending files from Q3 2015 to Q1 2024; prices were standardized to cost per 28-day treatment for non-small cell lung cancer. We compared the FDA-labeled indications as of January 2024 to determine the degree of overlap, defining indications on the basis of tumor type (or mutation status) and stage of treatment. Monthly prices for checkpoint inhibitors decreased slightly over the study period, largely attributable to high inflation from 2020 to 2023. Five drugs, including pembrolizumab and nivolumab, maintained prices within 7% of each other; five other drugs were introduced at prices 3%-20% lower than the existing checkpoint inhibitors. In Q1 2024, monthly prices ranged from $7,783 in US dollars (USD) (ipilimumab) to $14,872 USD (dostarlimab). We identified 55 distinct indications for the 11 drugs; of these, 24 (44%) were approved for only one drug and 16 (29%) for only two drugs. Pembrolizumab accounted for 45 of 55 (82%) total indications and 18 of 24 (75%) nonoverlapping indications. Of eight checkpoint inhibitors launched since 2015, three were initially approved for nonoverlapping uses. Prices of checkpoint inhibitors have decreased only slightly since introduction. This may be partially explained by lack of overlapping indications, which hinders direct competition among within-class drugs. Expanding drug price negotiations or incentivizing comparative effectiveness research may help to promote competition and address high prices.
- Research Article
- 10.1161/circ.152.suppl_3.4335923
- Nov 4, 2025
- Circulation
- Shuqi Zhang + 5 more
Background: Observational studies that examine the comparative effectiveness of healthcare services often face challenges in controlling for confounding by indication. This study examines whether clinical data adds value over claims data alone in addressing this bias when evaluating the effectiveness of community-based physical or occupational therapy (PT/OT) after stroke. Methods: Medicare claims data from the 6 months prior to and including the index hospitalization were linked to clinical data of 5,244 stroke survivors discharged home from 40 North Carolina hospitals. Measures of stroke severity, comorbidities, and previous healthcare utilization were derived from the claims. Clinical measures included the National Institutes of Health Stroke Scale, stroke diagnosis categories, ambulatory status, comorbidities, and therapy need. We estimated the effectiveness of any PT/OT use versus no use within 30 days of discharge. The primary outcome was 90-day functional status after discharge. We used Targeted Maximum Likelihood Estimation (TMLE) with SuperLearner and Inverse Probability of Treatment Weighting (IPTW) respectively to control for confounding across claims-only, clinical-only, and two joint models, claims-based with unique clinical elements and clinical-based with unique claims elements. Results: Across all models in the full population (mean age, 74; 53% female; 78% Whites), receipt of any therapy within 30 days was unexpectedly associated with lower 90-day functional score (Figure 1). Models incorporating clinical data yielded more attenuated and consistent estimates closer to the hypothesized beneficial effect of therapy, while the addition of unique claims data elements did not change the estimates of clinical-only models (Figure 1). When the analysis was restricted to the 2,335 patients who needed therapy at discharge, there was no significant association between PT/OT use and functional score (Figure 2). Among the estimation approaches, TMLE models yielded more theory-consistent and precise estimates than IPTW models. Conclusions: Clinical data outperformed claims data in controlling for confounding by indication. Restricting to individuals who needed therapy reduced confounding by indication. The unexpected, non-significant effect may be explained by residual confounding and/or the imprecision of PT/OT measure. Incorporating clinical measures and robust analytic approaches is essential for valid estimates in comparative effectiveness research.
- Research Article
- 10.1182/blood-2025-2211
- Nov 3, 2025
- Blood
- Fayaz Khan + 2 more
Comparative real-world outcomes of bispecific antibodies versus pomalidomide-based regimens in relapsed/refractory multiple myeloma: A propensity-matched cohort analysis
- Research Article
1
- 10.1001/jamanetworkopen.2025.41025
- Nov 3, 2025
- JAMA Network Open
- Shalom Haggiag + 86 more
Early treatment choice in relapsing-remitting multiple sclerosis (RRMS) is prognostically crucial, yet robust comparative data on cladribine vs sphingosine-1-phosphate receptor modulators (S1PRMs) in treatment-naive patients with RRMS are limited. To compare the clinical effectiveness of cladribine vs S1PRMs in treatment-naive individuals with RRMS. This comparative effectiveness research study used data from 108 Italian multiple sclerosis (MS) centers affiliated with the Italian Multiple Sclerosis and Related Disorders Register. All treatment-naive patients with RRMS who initiated cladribine or an S1PRM (fingolimod, ozanimod, or ponesimod) between January 2011 and October 2021 and had at least 12 months of follow-up were included. Propensity score matching and pairwise censoring were used to balance baseline differences and follow-up duration. Patient data were extracted from the register in September 2024. Initiation of cladribine or an S1PRM, with duration reflecting clinical practice. The primary outcome was no evidence of disease activity (NEDA-3) and its subcomponents. Secondary analyses evaluated disability accrual subdivided into progression independent of relapse activity (PIRA) and relapse-associated worsening (RAW), plus variables associated with treatment response. Cox proportional hazards models, adjusted for visit and magnetic resonance imaging (MRI) frequency, were used to compare outcomes. Of the 1587 patients (485 taking cladribine and 1102 taking S1PRMs), matching yielded 475 pairs (950 individuals; mean [SD] age, 34.7 [10.7] years; 686 female [72.2%]), with a median (IQR) follow-up period of 25 (12-60) months. For the cladribine vs S1PRM groups, no significant differences were observed in relapse rates (72 patients [15.2%] vs 76 patients [16.0%]), MRI activity (137 patients [31.3%] vs 145 patients [34.8%]), or NEDA-3 loss (194 patients [44.4% vs 219 patients [52.2%]). Cladribine was associated with a lower risk of disability worsening vs S1PRM (54 patients [11.4%] vs 70 patients [14.7%]; hazard ratio [HR], 0.64; 95% CI, 0.42-0.96; P = .03), a finding that was confirmed in sensitivity analyses for patients younger than 40 years, those whose diagnoses were made according to the 2017 McDonald Criteria, and those with Expanded Disability Status Scale score less than or equal to 3.0. This was mainly driven by reduced PIRA risk with cladribine (HR, 0.40; 95% CI, 0.20-0.79; P = .009), with no RAW difference. After 36 months, patients treated with cladribine showed higher relapse risk (HR, 1.81; 95% CI, 1.02-3.20; P = .04) and increased NEDA-3 loss (HR, 2.08; 95% CI, 1.18-3.67; P = .01). Discontinuation rates were similar (HR, 0.92; 95% CI, 0.67-1.15; P = .58). These findings suggest cladribine was associated with superior effectiveness in reducing disability progression over 25 months, likely due to reduced PIRA, despite comparable short-term NEDA-3 outcomes. However, relapse prevention declined after 36 months, suggesting retreatment or therapy modification within 3 years may be needed to maintain long-term disease control.
- Research Article
- 10.1093/jamia/ocaf137
- Nov 1, 2025
- Journal of the American Medical Informatics Association : JAMIA
- Rui Yang + 10 more
Systematic reviews in comparative effectiveness research require timely evidence synthesis. With the rapid advancement of medical research, preprint articles play an increasingly important role in accelerating knowledge dissemination. However, as preprint articles are not peer-reviewed before publication, their quality varies significantly, posing challenges for evidence inclusion in systematic reviews. We developed AutoConfidenceScore (automated confidence score assessment), an advanced framework for predicting preprint publication, which reduces reliance on manual curation and expands the range of predictors, including three key advancements: (1) automated data extraction using natural language processing techniques, (2) semantic embeddings of titles and abstracts, and (3) large language model (LLM)-driven evaluation scores. Additionally, we employed two prediction models: a random forest classifier for binary outcome and a survival cure model that predicts both binary outcome and publication risk over time. The random forest classifier achieved an area under the receiver operating characteristic curve (AUROC) of 0.747 using all features. The survival cure model achieved an AUROC of 0.731 for binary outcome prediction and a concordance index of 0.667 for time-to-publication risk. Our study advances the framework for preprint publication prediction through automated data extraction and multiple feature integration. By combining semantic embeddings with LLM-driven evaluations, AutoConfidenceScore significantly enhances predictive performance while reducing manual annotation burden. AutoConfidenceScore has the potential to facilitate incorporation of preprint articles during the appraisal phase of systematic reviews, supporting researchers in more effective utilization of preprint resources.