Abstract
BackgroundNon-inferiority trials are increasingly used to evaluate new treatments that are expected to have secondary advantages over standard of care, but similar efficacy on the primary outcome. When designing a non-inferiority trial with a binary primary outcome, the choice of effect measure for the non-inferiority margin (e.g. risk ratio or risk difference) has an important effect on sample size calculations; furthermore, if the control event risk observed is markedly different from that assumed, the trial can quickly lose power or the results become difficult to interpret.MethodsWe propose a new way of designing non-inferiority trials to overcome the issues raised by unexpected control event risks. Our proposal involves using clinical judgement to specify a ‘non-inferiority frontier’, i.e. a curve defining the most appropriate non-inferiority margin for each possible value of control event risk. Existing trials implicitly use frontiers defined by a fixed risk ratio or a fixed risk difference. We discuss their limitations and propose a fixed arcsine difference frontier, using the power-stabilising transformation for binary outcomes, which may better represent clinical judgement. We propose and compare three ways of designing a trial using this frontier: testing and reporting on the arcsine scale; testing on the arcsine scale but reporting on the risk difference or risk ratio scale; and modifying the margin on the risk difference or risk ratio scale after observing the control event risk according to the power-stabilising frontier.ResultsTesting and reporting on the arcsine scale leads to results which are challenging to interpret clinically. For small values of control event risk, testing on the arcsine scale and reporting results on the risk difference scale produces confidence intervals at a higher level than the nominal one or non-inferiority margins that are slightly smaller than those back-calculated from the power-stabilising frontier alone. However, working on the arcsine scale generally requires a larger sample size compared to the risk difference scale. Therefore, working on the risk difference scale, modifying the margin after observing the control event risk, might be preferable, as it requires a smaller sample size. However, this approach tends to slightly inflate type I error rate; a solution is to use a slightly lower significance level for testing, although this modestly reduces power. When working on the risk ratio scale instead, the same approach based on the modification of the margin leads to power levels above the nominal one, maintaining type I error under control.ConclusionsOur proposed methods of designing non-inferiority trials using power-stabilising non-inferiority frontiers make trial design more resilient to unexpected values of the control event risk, at the only cost of requiring somewhat larger sample sizes when the goal is to report results on the risk difference scale.
Highlights
Often a new treatment is expected not to have greater efficacy than the standard treatment, but to provide advantages in terms of costs, side-effects or acceptability
The results provided strong evidence of noninferiority based on the prespecified non-inferiority margin as a risk difference, but they were consistent with a threefold increase in risk based on the risk ratio, and so the authors did not conclude non-inferiority
In an example trial with one-sided α = 2.5%, power = 90%, πe0 = 5%, and πf1 = 10%, the sample size to show non-inferiority on the arcsine scale (568 patients/group) is larger than on the risk difference scale (400 patients/group; 5% absolute margin); choosing the arcsine frontier may require up to 40% more patients
Summary
Often a new treatment is expected not to have greater efficacy than the standard treatment, but to provide advantages in terms of costs, side-effects or acceptability. The goal might be to preserve a certain proportion of the effect of the standard relative to placebo, which can be formulated as either an absolute or relative margin In this case, we refer to the maximum tolerable effect size as M2 (where M2 = x% of M1). Non-inferiority trials are increasingly used to evaluate new treatments that are expected to have secondary advantages over standard of care, but similar efficacy on the primary outcome. When designing a noninferiority trial with a binary primary outcome, the choice of effect measure for the non-inferiority margin (e.g. risk ratio or risk difference) has an important effect on sample size calculations; if the control event risk observed is markedly different from that assumed, the trial can quickly lose power or the results become difficult to interpret
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have