Introduction While randomized phase III clinical trials (RCTs) are the gold-standard for establishing drug efficacy in the United States, external control arms (ECAs) may be valuable for both early and late phase drug development. An ECA is a collection of patients with the disease of interest who were treated outside of a clinical trial of interest (ie, target trial) and whose outcomes are compared to those of the target trial patients' to evaluate comparative drug efficacy and/or safety. ECAs created through rigorous statistical non-experimental methods such as matching patient-level characteristics may provide early estimates of comparative efficacy in advance of RCTs and offset prospective enrollment of control patients in randomized studies. We evaluate the validity of using a matched ECA to estimate the comparative efficacy of an experimental drug in relapsed refractory multiple myeloma (r/r MM). Methods From the anonymized Medidata Enterprise Data Store (MEDS), we identified historic RCTs in r/r MM containing patients exposed to an experimental drug as well as an active control drug (ie, a corticosteroid). Arbitrarily, we selected one trial to serve as the “target trial”. For the remaining trials, we collapsed all corticosteroid control patients into a single pool to construct an ECA. Using propensity score (PS) optimal full matching, we identified corticosteroid ECA patients by selecting from the candidate patients those whose baseline demographic and clinical attributes were similar to patients receiving the experimental drug in the target trial. Using the Kaplan-Meier (KM) estimator and a Cox proportional hazards regression, we first compared overall survival time (OS) of patients in the experimental drug arm vs control arm in the target trial. We then compared the OS in the experimental drug arm vs the ECA. Results We found fewer than five historic clinical trials from MEDS which were open-label RCTs conducted from 2010-2017. They enrolled r/r MM patients aged ≥18 years who had received more than two prior lines of therapy, one of which was lenalidomide and bortezomib. The target trial contained 300 patients in the investigational drug arm and 152 patients in the control arm. Using the PS matching, we identified 183 patients from the control arms in pooled trials that were matched with 290 patients in the investigational drug arm in the target trial (i.e., ECA). These ECA patients were then upweighted to be matched with 290 patients in the investigational drug arm in the target trial. Full matching successfully attenuated baseline differences between the investigational drug and ECA arms. Figure 2 shows the KM curves of OS (investigational vs control) in the target trial. Median OS was 13.25 months vs. 8.52 months (log-rank p<0.01) and the investigational drug was associated with a 36% reduction in the hazard of death (HR 0.74, 95% CI: 0.60-0.92). Figure 2 shows the KM curves of OS, after full matching, using the ECA. Median OS was 13.51 months vs. 8.75 months (log-rank p<0.01) and the investigational drug was associated with a 36% reduction in the hazard of death (HR 0.74, 95% CI: 0.58-0.94). Conclusions In this methodologic validation study, we have demonstrated that it is possible to produce an ECA from historical clinical trials data using propensity score methods which replicates the comparative efficacy estimate from a target without introducing bias. That is, the ECA, which was well balanced with the investigational arm with respect to baseline patient attributes, returned nearly identical estimates of comparative efficacy as assessed by OS of both target trial control arm and ECA against the target trial experimental arm. This work illustrates that it is possible to create a valid ECA even when historical data size is limited and to do so without excessive exclusion of non-matched patients. That the treatment effect on OS estimated in comparison to the randomized control was very closely matched by that of the ECA suggests this approach to ECAs may be used to (1) estimate comparative efficacy of therapies that have only been studied in single arm settings and/or (2) augment or replace a randomized control arm in future trials when ethically or practically challenging. Figure 1A. Kaplan-Meier OS estimates of target trial experimental (N=300) vs. control arm (N=152) Figure 1B. Kaplan-Meier OS estimates of target trial experimental (N=290) vs. ECA (N=290)
Read full abstract