Evaluating the Impact of Outcome Delay on the Efficiency of Two-Arm Group-Sequential Trials
Group sequential designs (GSDs) benefit trial designs by allowing the trial to terminate early either for futility or efficacy. However, a delay in observing the primary outcome can reduce this efficiency. This article aims to ascertain the size of outcome delay that results in the realized efficiency gains (EGs) of GSDs becoming negligible compared to a classical fixed-sample alternative. We measure the impact of delay by developing formulae for the number of “pipeline participants” in two-arm GSDs with normal data, assuming different recruitment models. Our formulae measure the EG from a GSD in terms of reduction in expected sample size. The results indicate GSDs can suffer considerable losses due to outcome delay as the expected EG with and without consideration of delay differ significantly. Even a small delay can have substantial impact on the trial’s efficiency. Conversely, even in presence of substantial delay, GSDs have a smaller expected time to trial completion in comparison to a single-stage trial. With increase in the number of stages the efficiency loss increases. The timing of the IAs can further considerably impact the efficiency of a GSD for delayed outcomes; in particular, conducting IAs too early in the trial can harm the design’s efficiency.
- Research Article
- 10.1186/s12874-025-02681-4
- Nov 12, 2025
- BMC medical research methodology
Ventilator-associated pneumonia (VAP) is an important healthcare acquired infection, which is associated with high morbidity and mortality. Conducting conventional randomised controlled trials (RCTs) on VAP prevention is often challenging, due to low numbers of eligible patients and events per site, especially for pathogen-specific interventions. We explored how group sequential designs (GSD) and sample size re-estimation (SSR) trial designs could improve RCT efficiency in simulated superiority trials to prevent VAP. Simulations were informed using data from the prospective observational Hospital Network Study - Preparation for a Randomised Evaluation of anti-Pneumonia Strategies (HONEST-PREPS). We tested the impact of different GSD and SSR designs on expected sample size (considering early stopping) and maximum sample size (no early stopping). We varied the type of stopping boundary, number and timepoint of interim analyses, and assumed and true prevention effect. We applied time-to-event analyses, with effect estimates expressed as hazard ratios, for the primary endpoint. The estimated 28-day cumulative incidence of VAP in HONEST-PREPS was 15.5%. For a 30% reduction in VAP (hazard ratio of 0.68), a standard RCT (power 80%) would require a sample size of 1291 patients. For GSD, Pocock boundariesresult in a smaller expected sample size (E[N] = 1128), but a larger maximum sample size (max(N) = 1578) than O'Brien Fleming boundaries (E[N] = 1170 and max(N) = 1389), when utilising the optimal placement of a single interim analysis, 48% and 64% of the maximum number of events for Pocock and O'Brien Fleming boundaries, respectively. SSR is more efficient compared to GSD when the incorrect prevention effect is initially used to plan the trial, as it maintains a power closer to the pre-specified desired power without substantial impact on the expected sample size. GSD and SSR are effective adaptive designs, preferable to fixed RCTs in a superiority trial comparing the effectiveness of an investigational intervention with a standard of care in preventing VAP among critically ill, ventilated patients. They can reduce the expected sample size between 9% and 12% and should be considered at the trial design stage.
- Dissertation
- 10.53846/goediss-7186
- Feb 21, 2022
Cardiovascular diseases are diseases of the heart and blood vessels constituting a major cause of death and disability worldwide. Cardiovascular drug development aims to deliver efficacious drugs to address the public health burden of cardiovascular diseases. However, the high costs associated with cardiovascular drug development, for example due to long-running clinical trials, sometimes including thousands of patients, place a high burden on the development of new efficacious treatments for cardiovascular diseases. Proposals for improving the efficiency of cardiovascular drug development include better disease characterization, more defined target populations, and the use of adaptive clinical trial designs. This dissertation focuses on adaptive clinical trial designs for cardiovascular research. \nAdaptive clinical trial designs, commonly referred to as adaptive designs, are clinical trial designs with a preplanned modification of design aspects, under some constraints such as preserving integrity and validity of the trials, based on interim data of the ongoing trial. Design aspects which are commonly modified include the sample size, number of doses or treatments, or endpoints. Adaptive designs offer flexibility compared to traditional clinical trials with a fixed design to accommodate newly gained information. However, with the flexibility comes an increased statistical complexity, as adaptive designs require an increased effort to control the probability that the clinical trial declares efficacy of an inefficacious treatment, that is the type I error rate, and to plan the number of patients required such that an efficacious treatment is detected with a high statistical power. \nThe focus of this dissertation is on two types of adaptive designs: group sequential designs and designs with a nuisance parameter based sample size re-estimation. In group sequential designs, the efficacy of a treatment is tested repeatedly during the conduct of the trial and the trial is stopped early if efficacy of the treatment can be shown with statistical significance. Thus, an efficacious treatment can be detected early in clinical trials with a group sequential design. In designs with a nuisance parameter based sample size re-estimation, the final sample size is adjusted using estimates of the potentially several nuisance parameters based on interim data. Nuisance parameters are for example the outcome variance in trials with continuous outcomes and the overall event rate in trials with count outcomes. The nuisance parameter based sample size re-estimation aims to assure that a clinical trial achieves the target power independently of the initially planned sample size. \nThe first objective of this dissertation is to study group sequential designs with recurrent events, motivated by clinical trials with patients suffering from chronic heart failure. In clinical trials with patients suffering from chronic heart failure, a common clinical relevant recurrent event outcome is the number of heart failure hospitalizations, which can also be part of a composite endpoint in combination with cardiovascular death. To model heart failure hospitalizations and the respective composite, a negative binomial model and a more robust semiparametric model have been proposed in the literature. However, group sequential designs have not been studied for these models. Therefore, I propose statistical methods for planning and analyzing group sequential designs for negative binomial models and more robust semiparametric models and study their asymptotic properties. Moreover, I show that the proposed planning and analysis methods result in an appropriate power and type I error rate, respectively, for parameter combinations common in clinical trials with patients suffering from chronic heart failure. I put a particular focus on the longitudinal nature of the recurrent events, i.e., a single subject can experience new events throughout the trial, and its consequential on the group sequential designs. The longitudinal natures of the outcomes distinguishes group sequential designs with recurrent events from group sequential designs for other common models, such a continuous, binary, or survival data. \nA second objective of this dissertation is to study nuisance parameter based sample size re-estimation in three-arm trials with normal outcomes; an investigation motivated by clinical trials with patients suffering from hypertension. A common endpoint in these trials modeled as normally distributed is the change of blood pressure between the baseline measurement and the end of the trial. I show that the ideas for nuisance parameter based sample size re-estimation in two-arm trials can be adapted to three-arm trials and highlight that the corresponding approaches do not result in the desired target power. Furthermore, I modify one of the sample size re-estimation procedures such that it results in appropriately powered three-arm clinical trials. \nThe third objective of this dissertation is to study incorporating prior information on the variance into the nuisance parameter based sample size re-estimation in two-arm trials with normal outcomes. This objective, too, is motivated by clinical trials with patients suffering from hypertension. I propose several ad hoc rules for incorporating prior information into the sample size re-estimation and by means of Monte Carlo simulation studies I show that the incorporation of prior information can reduce the variability of the final sample size when no prior-data conflict is present. However, I illustrate that in the presence of a prior-data conflict, the designs with a sample size re-estimation incorporating prior information do not convey the target power. I also highlight that common approaches of robustifying the prior information cannot completely mitigate the negative effects of a prior-data conflict without also nullifying the benefits of incorporating prior information on the nuisance parameter into the sample size re-estimation.
- Research Article
24
- 10.1002/sim.5662
- Oct 19, 2012
- Statistics in Medicine
Adaptive clinical trial design has been proposed as a promising new approach that may improve the drug discovery process. Proponents of adaptive sample size re-estimation promote its ability to avoid 'up-front' commitment of resources, better address the complicated decisions faced by data monitoring committees, and minimize accrual to studies having delayed ascertainment of outcomes. We investigate aspects of adaptation rules, such as timing of the adaptation analysis and magnitude of sample size adjustment, that lead to greater or lesser statistical efficiency. Owing in part to the recent Food and Drug Administration guidance that promotes the use of pre-specified sampling plans, we evaluate alternative approaches in the context of well-defined, pre-specified adaptation. We quantify the relative costs and benefits of fixed sample, group sequential, and pre-specified adaptive designs with respect to standard operating characteristics such as type I error, maximal sample size, power, and expected sample size under a range of alternatives. Our results build on others' prior research by demonstrating in realistic settings that simple and easily implemented pre-specified adaptive designs provide only very small efficiency gains over group sequential designs with the same number of analyses. In addition, we describe optimal rules for modifying the sample size, providing efficient adaptation boundaries on a variety of scales for the interim test statistic for adaptation analyses occurring at several different stages of the trial. We thus provide insight into what are good and bad choices of adaptive sampling plans when the added flexibility of adaptive designs is desired.
- Research Article
3
- 10.1080/10543406.2023.2170403
- Feb 20, 2023
- Journal of biopharmaceutical statistics
Cancer immunotherapy trials are frequently characterized by delayed treatment effects such that the proportional hazards assumption is violated and the log-rank test suffers a substantial loss of statistical power. To increase the efficacy of the trial design, a variety of weighted log-rank tests have been proposed for fixed sample and group sequential trial designs. However, in such a group sequential design, it is often not recommended for futility interim monitoring due to possible delayed treatment effect which could result a high false-negative rate. To resolve this problem, we propose a group sequential design using a piecewise weighted log-rank test which provides an event-driven approach based on number of events after the delayed time. That is, the interim looks will not be conducted until the planned number of events observed after the delay time. Thus, it avoids the possibility of false-negative rate due to the delayed treatment effect. Furthermore, with an event-driven approach, the proposed group sequential design is robust against the underlying survival, accrual and censoring distributions. The group sequential designs using Fleming-Harrington-( ρ , γ ) weighted log-rank test and a new weighted log-rank test are also discussed.
- Research Article
- 10.1186/s12874-024-02363-7
- Oct 17, 2024
- BMC Medical Research Methodology
BackgroundIn group-sequential designs, it is typically assumed that there is no time gap between patient enrollment and outcome measurement in clinical trials. However, in practice, there is usually a lag between the two time points. This can affect the statistical analysis of the data, especially in trials with interim analyses. One approach to address delayed responses has been introduced by Hampson and Jennison (J R Stat Soc Ser B Stat Methodol 75:3-54, 2013), who proposed the use of error-spending stopping boundaries for patient enrollment, followed by critical values to reject the null hypothesis if the stopping boundaries are crossed beforehand. Regarding the choice of a trial design, it is important to consider the efficiency of trial designs, e.g. in terms of the probability of trial success (power) and required resources (sample size and time).MethodsThis article aims to shed more light on the performance comparison of group sequential clinical trial designs that account for delayed responses and designs that do not. Suitable performance measures are described and designs are evaluated using the R package rpact. By doing so, we provide insight into global performance measures, discuss the applicability of conditional performance characteristics, and finally whether performance gain justifies the use of complex trial designs that incorporate delayed responses.ResultsWe investigated how the delayed response group sequential test (DR-GSD) design proposed by Hampson and Jennison (J R Stat Soc Ser B Stat Methodol 75:3-54, 2013) can be extended to include nonbinding lower recruitment stopping boundaries, illustrating that their original design framework can accommodate both binding and nonbinding rules when additional constraints are imposed. Our findings indicate that the performance enhancements from methods incorporating delayed responses heavily rely on the sample size at interim and the volume of data in the pipeline, with overall performance gains being limited.ConclusionThis research extends existing literature on group-sequential designs by offering insights into differences in performance. We conclude that, given the overall marginal differences, discussions regarding appropriate trial designs can pivot towards practical considerations of operational feasibility.
- Research Article
19
- 10.1177/0272989x04269240
- Oct 1, 2004
- Medical decision making : an international journal of the Society for Medical Decision Making
Comparative diagnostic accuracy (CDA) studies are typically small retrospective studies supporting a higher accuracy for one modality over another for either staging a particular disease or assessing response to therapy, and they are used to generate hypotheses for larger prospective trials. The purpose of this article is to introduce the group sequential design (GSD) approach in planning these larger trials. Methodology needed for using GSD in the CDA studies is recently developed. In this article, GSD with the O'Brien and Fleming (OBF) stopping rule is described and guidelines for sample size calculation are provided. Simulated data is used to demonstrate the application of GSD in the design/analysis of a clinical trial in the CDA study setting. The expected sample size needed for planning a trial with GSD (under the OBF stopping rule) is slightly inflated but may ultimately result in greater savings of patient resources. GSD is a specialized statistical method that is helpful in balancing the ethical and financial advantages of stopping a study early against the risk of an incorrect conclusion and should be adopted for planning CDA studies.
- Research Article
25
- 10.1002/1521-4036(200111)43:7<821::aid-bimj821>3.0.co;2-f
- Nov 1, 2001
- Biometrical Journal
It is well known that point estimates in group sequential designs are biased. This also applies to adaptive designs that enable, e.g., data driven reassessments of group sample sizes. For triangular designs, Whitehead (1986) (Biometrika 73, 573–581) proposed a bias adjusted estimate. But this estimate is not feasible in adaptive designs although it is in group sequential designs. Furthermore, there is a waste of information because it does not use the information at which stage the trial was stopped. We present a modification which does use this information and which is applicable to adaptive designs. The modified estimate achieves an improvement in group sequential designs and shows similar results in adaptive designs.
- Research Article
29
- 10.1016/s0149-7634(05)80104-6
- Mar 1, 1991
- Neuroscience and Biobehavioral Reviews
Stagewise, group sequential experimental designs for quantal responses. One-sample and two-sample comparisons
- Research Article
9
- 10.1177/0272989x211045036
- Dec 3, 2021
- Medical Decision Making
IntroductionAdaptive designs allow changes to an ongoing trial based on prespecified early examinations of accrued data. Opportunities are potentially being missed to incorporate health economic considerations into the design of these studies.MethodsWe describe how to estimate the expected value of sample information for group sequential design adaptive trials. We operationalize this approach in a hypothetical case study using data from a pilot trial. We report the expected value of sample information and expected net benefit of sampling results for 5 design options for the future full-scale trial including the fixed-sample-size design and the group sequential design using either the Pocock stopping rule or the O’Brien-Fleming stopping rule with 2 or 5 analyses. We considered 2 scenarios relating to 1) using the cost-effectiveness model with a traditional approach to the health economic analysis and 2) adjusting the cost-effectiveness analysis to incorporate the bias-adjusted maximum likelihood estimates of trial outcomes to account for the bias that can be generated in adaptive trials.ResultsThe case study demonstrated that the methods developed could be successfully applied in practice. The results showed that the O’Brien-Fleming stopping rule with 2 analyses was the most efficient design with the highest expected net benefit of sampling in the case study.ConclusionsCost-effectiveness considerations are unavoidable in budget-constrained, publicly funded health care systems, and adaptive designs can provide an alternative to costly fixed-sample-size designs. We recommend that when planning a clinical trial, expected value of sample information methods be used to compare possible adaptive and nonadaptive trial designs, with appropriate adjustment, to help justify the choice of design characteristics and ensure the cost-effective use of research funding.HighlightsOpportunities are potentially being missed to incorporate health economic considerations into the design of adaptive clinical trials.Existing expected value of sample information analysis methods can be extended to compare possible group sequential and nonadaptive trial designs when planning a clinical trial.We recommend that adjusted analyses be presented to control for the potential impact of the adaptive designs and to maintain the accuracy of the calculations.This approach can help to justify the choice of design characteristics and ensure the cost-effective use of limited research funding.
- Research Article
7
- 10.1007/s12561-017-9188-x
- Mar 15, 2017
- Statistics in Biosciences
Clinical trials with adaptive sample size re-assessment, based on an analysis of the unblinded interim results (ubSSR), have gained in popularity due to uncertainty regarding the value of $$\delta $$ at which to power the trial at the start of the study. While the statistical methodology for controlling the type-1 error of such designs is well established, there remain concerns that conventional group sequential designs with no ubSSR can accomplish the same goals with greater efficiency. The precise manner in which this efficiency comparison can be objectified has been difficult to quantify, however. In this paper, we present a methodology for making this comparison in a standard, well-accepted manner by plotting the unconditional power curves of the two approaches while holding constant their expected sample size, at each value of $$\delta $$ in the range of interest. It is seen that under reasonable decision rules for increasing sample size (conservative promising zones, and no more than a 50% increase in sample size) there is little or no loss of efficiency for the adaptive designs in terms of unconditional power. The two approaches, however, have very different conditional power profiles. More generally, a methodology has been provided for comparing any design with ubSSR relative to a comparable group sequential design with no ubSSR, so one can determine whether the efficiency loss, if any, of the ubSSR design is offset by the advantages it confers for re-powering the study at the time of the interim analysis.
- Research Article
5
- 10.1186/s12874-022-01734-2
- Oct 1, 2022
- BMC Medical Research Methodology
BackgroundAssessing the long term effects of many surgical interventions tested in pragmatic RCTs may require extended periods of participant follow-up to assess effectiveness and use patient-reported outcomes that require large sample sizes. Consequently the RCTs are often perceived as being expensive and time-consuming, particularly if the results show the test intervention is not effective. Adaptive, and particularly group sequential, designs have great potential to improve the efficiency and cost of testing new and existing surgical interventions. As a means to assess the potential utility of group sequential designs, we re-analyse data from a number of recent high-profile RCTs and assess whether using such a design would have caused the trial to stop early.MethodsMany pragmatic RCTs monitor participants at a number of occasions (e.g. at 6, 12 and 24 months after surgery) during follow-up as a means to assess recovery and also to keep participants engaged with the trial process. Conventionally one of the outcomes is selected as the primary (final) outcome, for clinical reasons, with others designated as either early or late outcomes. In such settings, novel group sequential designs that use data from not only the final outcome but also from early outcomes at interim analyses can be used to inform stopping decisions. We describe data from seven recent surgical RCTs (WAT, DRAFFT, WOLLF, FASHION, CSAW, FIXDT, TOPKAT), and outline possible group sequential designs that could plausibly have been proposed at the design stage. We then simulate how these group sequential designs could have proceeded, by using the observed data and dates to replicate how information could have accumulated and decisions been made for each RCT.ResultsThe results of the simulated group sequential designs showed that for two of the RCTs it was highly likely that they would have stopped for futility at interim analyses, potentially saving considerable time (15 and 23 months) and costs and avoiding patients being exposed to interventions that were either ineffective or no better than standard care. We discuss the characteristics of RCTs that are important in order to use the methodology we describe, particularly the value of early outcomes and the window of opportunity when early stopping decisions can be made and how it is related to the length of recruitment period and follow-up.ConclusionsThe results for five of the RCTs tested showed that group sequential designs using early outcome data would have been feasible and likely to provide designs that were at least as efficient, and possibly more efficient, than the original fixed sample size designs. In general, the amount of information provided by the early outcomes was surprisingly large, due to the strength of correlations with the primary outcome. This suggests that the methods described here are likely to provide benefits more generally across the range of surgical trials and more widely in other application areas where trial designs, outcomes and follow-up patterns are structured and behave similarly.
- Research Article
- 10.1177/09622802251399914
- Nov 27, 2025
- Statistical methods in medical research
A parallel randomized trial is frequently used to investigate the treatment effectiveness as compared to the gold standard. In early phase trials, a group sequential design has the potential to reduce the expected sample size as compared to the traditional one-stage design, and protect participants when a new treatment is not as effective as expected. When the outcome is binary, a group sequential design based on exact binomial distribution is preferable as compared to the asymptotic limiting distribution. To improve the design efficiency, we propose to develop new parallel two-stage adaptive design and promising zone design allowing sample size adjustment in the second stage based on the outcome from the first stage. The conditional probability is guaranteed in the proposed designs when a trial proceeds to the second stage. All these designs control the type I error rate, but only the proposed two designs guarantee the conditional probability constraint. We used a real example from a completed cancer trial to illustrate the application of the proposed designs. The adaptive designsubstantially increases unconditional power but requires a large sample size as compared to the group sequential design. The promising zone design achieves a good balance between statistical power and the expected sample size.
- Research Article
17
- 10.1002/sim.3790
- Mar 8, 2010
- Statistics in Medicine
Confirmatory clinical trials comparing the efficacy of a new treatment with an active control typically aim at demonstrating either superiority or non-inferiority. In the latter case, the objective is to show that the experimental treatment is not worse than the active control by more than a pre-specified non-inferiority margin. We consider two classes of group-sequential designs that combine the superiority and non-inferiority objectives: non-adaptive designs with fixed group sizes and adaptive designs where future group sizes may be based on the observed treatment effect. For both classes, we derive group-sequential designs meeting error probability constraints that have the lowest possible expected sample size averaged over a set of values of the treatment effect. These optimized designs provide an efficient means of reducing expected sample size under a range of treatment effects, even when the separate objectives of proving superiority and non-inferiority would require quite different fixed sample sizes. We also present error spending versions of group-sequential designs that are easily implementable and can handle unpredictable group sizes or information levels. We find the adaptive choice of group sizes to yield some modest efficiency gains; alternatively, expected sample size may be reduced by adding another interim analysis to a non-adaptive group-sequential design.
- Research Article
11
- 10.1002/pst.1599
- Sep 23, 2013
- Pharmaceutical Statistics
Two-stage clinical trial designs may be efficient in pharmacogenetics research when there is some but inconclusive evidence of effect modification by a genomic marker. Two-stage designs allow to stop early for efficacy or futility and can offer the additional opportunity to enrich the study population to a specific patient subgroup after an interim analysis. This study compared sample size requirements for fixed parallel group, group sequential, and adaptive selection designs with equal overall power and control of the family-wise type I error rate. The designs were evaluated across scenarios that defined the effect sizes in the marker positive and marker negative subgroups and the prevalence of marker positive patients in the overall study population. Effect sizes were chosen to reflect realistic planning scenarios, where at least some effect is present in the marker negative subgroup. In addition, scenarios were considered in which the assumed 'true' subgroup effects (i.e., the postulated effects) differed from those hypothesized at the planning stage. As expected, both two-stage designs generally required fewer patients than a fixed parallel group design, and the advantage increased as the difference between subgroups increased. The adaptive selection design added little further reduction in sample size, as compared with the group sequential design, when the postulated effect sizes were equal to those hypothesized at the planning stage. However, when the postulated effects deviated strongly in favor of enrichment, the comparative advantage of the adaptive selection design increased, which precisely reflects the adaptive nature of the design.
- Research Article
11
- 10.1097/pcc.0000000000003273
- May 17, 2023
- Pediatric critical care medicine : a journal of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies
This systematic review investigates the use of adaptive designs in randomized controlled trials (RCTs) in pediatric critical care. PICU RCTs, published between 1986 and 2020, stored in the www.PICUtrials.net database and MEDLINE, EMBASE, CENTRAL, and LILACS databases were searched (March 9, 2022) to identify RCTs published in 2021. PICU RCTs using adaptive designs were identified through an automated full-text screening algorithm. All RCTs involving children (< 18 yr old) cared for in a PICU were included. There were no restrictions to disease cohort, intervention, or outcome. Interim monitoring by a Data and Safety Monitoring Board that was not prespecified to change the trial design or implementation of the study was not considered adaptive. We extracted the type of adaptive design, the justification for the design, and the stopping rule used. Characteristics of the trial were also extracted, and the results summarized through narrative synthesis. Risk of bias was assessed using the Cochrane Risk of Bias Tool 2. Sixteen of 528 PICU RCTs (3%) used adaptive designs with two types of adaptations used; group sequential design and sample size reestimation. Of the 11 trials that used a group sequential adaptive design, seven stopped early due to futility and one stopped early due to efficacy. Of the seven trials that performed a sample size reestimation, the estimated sample size decreased in three trials and increased in one trial. Little evidence of the use of adaptive designs was found, with only 3% of PICU RCTs incorporating an adaptive design and only two types of adaptations used. Identifying the barriers to adoption of more complex adaptive trial designs is needed.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.