Purpose: Many clinical trials of promising interventions fail to see anticipated effects at follow-up. To avoid carrying out a full trial when the tested treatment is likely to be inefficacious, researchers may include one or more interim analyses, assessing for trial futility before the final analysis, discontinuing the trial early if interim effects are less than expected. The current approach tests whether the trial is likely to show futility, not whether the treatment is likely to be inefficacious. Treatments that show efficacy but with wide confidence bounds around efficacy estimates may be stopped inappropriately for futility. We present an alternative approach which focuses on the expected efficacy of the treatment. The approach uses “futility regions” to determine whether to stop trials of treatments that are unlikely to show a clinically important effect at interim—a stopping rule which requires interim interval estimates to lie entirely within a “stopping zone” (current approaches only require estimates to cross into the region of null/futile effects). We contrast this with extant methods, highlighting how the proposed approach focuses on a clinically relevant question, and protects against undue stopping because of imprecise estimates. For a researcher wanting to gather evidence of efficacious treatments, this is a desirable characteristic; trials with precise interim estimates of null effects should be stopped, whereas those with imprecise estimates that include useful effect magnitudes should be allowed to continue. Method: We created a simulation study, testing 1000 permutations of six different parallel-design trial scenarios, all featuring one interim analysis, and stopping only for futility. The scenarios demonstrate a range of effect sizes and variances. We implemented four types of interval-based stopping rules—using Frequentist and Bayesian approaches—and compared their performance to the more commonly used O’Brien-Fleming and Pocock designs. We used a range of stopping rules, 15 in total, to demonstrate the limits of the approaches in the different trial scenarios. Results: All approaches discriminated well between trials with a final null effect, and those with a final large treatment effect. However, when the trial variance was different to that expected at the design stage, the interval-based approaches demonstrated favorable characteristics—continuing more permutations with imprecise but promising estimates, and stopping more of those with precise estimates of clinically trivial or null (futile) effect sizes (table). Empirical Bayesian approaches showed enhanced precision compared to the other methods, due to their property of allowing interim estimates to inform the prior distributions of the final effect estimate. Conclusion: A simple implementation of adaptive trial methodology, featuring only one interim analysis and using a modified version of interval estimate stopping rules demonstrated preferable performance characteristics for the researcher seeking to stop only trials that precisely estimate null effects than more current approaches. These characteristics have the benefit of producing two possible meaningful trial endpoints: completed trials that had continued to show promise at interim, and stopped trials with precise estimates of a null effect. This contrasts with a trial which stops early with an imprecise estimate of effect, about which few conclusions can be drawn.
Read full abstract