Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses

Shirley Wang ,Jessica M Franklin,Robert J Glynn,Lily G Bessette,Elvira D’Andrea,Elizabeth M Garry,Su Been Lee,Cassie York,Helen Tesfaye,Elisabetta Patorno,Ajinkya Pawar,Sushama Kattinakere Sreedhara,Hemin Lee,Sebastian Schneeweiß ,Heidi Zakoul,Dianne Paraoan,David Martín ,Luke Zabotka,Samy Suissa,Julie M Paik ,Kueiyu Joshua Lin,Dureshahwar Jawaid,Rishi Desai ,Nileesa Gautam,William B Feldman ,John Concato,Kenneth Quinto

doi:10.1001/jama.2023.4221

Abstract

ImportanceNonrandomized studies using insurance claims databases can be analyzed to produce real-world evidence on the effectiveness of medical products. Given the lack of baseline randomization and measurement issues, concerns exist about whether such studies produce unbiased treatment effect estimates.ObjectiveTo emulate the design of 30 completed and 2 ongoing randomized clinical trials (RCTs) of medications with database studies using observational analogues of the RCT design parameters (population, intervention, comparator, outcome, time [PICOT]) and to quantify agreement in RCT-database study pairs.Design, Setting, and ParticipantsNew-user cohort studies with propensity score matching using 3 US claims databases (Optum Clinformatics, MarketScan, and Medicare). Inclusion-exclusion criteria for each database study were prespecified to emulate the corresponding RCT. RCTs were explicitly selected based on feasibility, including power, key confounders, and end points more likely to be emulated with real-world data. All 32 protocols were registered on ClinicalTrials.gov before conducting analyses. Emulations were conducted from 2017 through 2022.ExposuresTherapies for multiple clinical conditions were included.Main Outcomes and MeasuresDatabase study emulations focused on the primary outcome of the corresponding RCT. Findings of database studies were compared with RCTs using predefined metrics, including Pearson correlation coefficients and binary metrics based on statistical significance agreement, estimate agreement, and standardized difference.ResultsIn these highly selected RCTs, the overall observed agreement between the RCT and the database emulation results was a Pearson correlation of 0.82 (95% CI, 0.64-0.91), with 75% meeting statistical significance, 66% estimate agreement, and 75% standardized difference agreement. In a post hoc analysis limited to 16 RCTs with closer emulation of trial design and measurements, concordance was higher (Pearson r, 0.93; 95% CI, 0.79-0.97; 94% meeting statistical significance, 88% estimate agreement, 88% standardized difference agreement). Weaker concordance occurred among 16 RCTs for which close emulation of certain design elements that define the research question (PICOT) with data from insurance claims was not possible (Pearson r, 0.53; 95% CI, 0.00-0.83; 56% meeting statistical significance, 50% estimate agreement, 69% standardized difference agreement).Conclusions and RelevanceReal-world evidence studies can reach similar conclusions as RCTs when design and measurements can be closely emulated, but this may be difficult to achieve. Concordance in results varied depending on the agreement metric. Emulation differences, chance, and residual confounding can contribute to divergence in results and are difficult to disentangle.

Full Text