BackgroundThere is increasing recognition of the complementary role for real-world evidence (RWE) in health care and regulatory decision-making (1). However, careful analysis is required when drugs are compared using observational data to account for differences between treatment groups. Electronic medical records (EMR) are an important source of real-world data (RWD), but outcomes are often recorded incompletely.We emulated a target trial of adalimumab (ADA) versus tofacitinib (TOF) in patients with rheumatoid arthritis (RA) using the OPAL dataset to illustrate the application of methodologies to address the challenges of non-random treatment assignment and incomplete data. The OPAL dataset is derived from EMR of 112 community-based rheumatologists around Australia, where practitioners have discretion to prescribe whichever b/tsDMARD they consider most clinically appropriate.ObjectivesTo estimate the average treatment effect (ATE) of TOF compared to ADA at 3 and 9 months, defined as the difference in mean disease activity score (DAS28CRP), in patients with RA who are new users of a b/tsDMARD. This is equivalent to aiming to estimate the intention-to-treat effect in a randomised controlled trial.MethodsOPAL patients diagnosed with RA were included if they initiated ADA or TOF between 1 October 2015 and 1 April 2021, were new b/tsDMARD users (no prior recorded b/tsDMARD, at least 6 months of prior csDMARD treatment), and had at least 1 component of DAS28CRP recorded at baseline or during follow-up. Data were also extracted on baseline characteristics. Baseline characteristics were DAS28CRP, patient demographics, regional location, disease duration, prescriber characteristics (including gender, experience), prior recorded comorbidities, and prior and concomitant treatment with csDMARDs and oral corticosteroids.We used random forest multiple imputation to impute missing baseline and follow-up DAS28CRP components (2). Stable balancing weights (SBW) were then used to balance the treatment groups in terms of their baseline characteristics, including DAS28CRP (3). For each imputed dataset, the ATE at 3 months was estimated as the difference between the mean outcome in the two treatment groups after balancing (i.e. weighting) the sample, and then these estimates were averaged across the 10 imputed datasets. The ATE at 9 months was estimated similarly. The whole procedure was subsequently performed in 1000 bootstrap samples to estimate a 95% confidence interval (CI) for the ATEs using the percentile method (4).Results842 patients were identified including n=569 treated with ADA and n=273 treated with TOF. After applying the SBW, the maximum difference between the mean of each baseline characteristic in the ADA and TOF groups was less than 0.03% of the corresponding standard deviation in the whole sample, indicating reasonable balance was achieved in this complex dataset. After weighting, mean DAS28CRP reduced from 5.3 at baseline (both ADA and TOF groups) to 2.6 and 2.3 at 3 and 9 months for ADA, and 2.4 and 2.3 at 3 and 9 months for TOF.The estimated ATE was -0.22 (95% CI -0.36, -0.03; p=0.02) at 3 months, indicating a modest but significant reduction in disease activity for patients on TOF. The estimated ATE was -0.03 (95% CI -0.19, 0.1; p=0.56) at 9 months, indicating no difference between groups.ConclusionDAS28CRP was significantly lower at 3 months for patients treated with TOF compared to ADA. However, 3 months of treatment with either drug led to substantive average reductions in mean DAS28CRP, consistent with remission. There was no difference between drugs at 9 months. Future work will estimate a per-protocol effect.