Abstract

The stepped wedge design is a new type of randomized controlled trial (RCT), which is increasingly being used, especially in the evaluation of service delivery interventions (SDIs). Kotz et al. [[1]Kotz D. Spigt M. Arts I.C. Crutzen R. Viechtbauer W. Use of the stepped wedge design cannot be recommended: a critical appraisal and comparison with the classic cluster randomized controlled trial design.J Clin Epidemiol. 2012; 65: 1249-1252Abstract Full Text Full Text PDF PubMed Scopus (58) Google Scholar] have recently criticized a number of the advantages that are often cited as being reasons why the stepped wedge design is so appealing.One trial design is never going to be appropriate in all evaluations. Although we believe that many of the points raised by Kotz et al. are valid, the recommendation that the stepped wedge design cannot be recommended in practice is very short sighted.In many settings, particularly in the area of SDIs, service providers often implement new interventions without full evidence of effectiveness. It is often argued by the service providers themselves that the intervention is highly likely to be effective. For this and other pragmatic reasons in these settings, an agreement to randomize to evaluate effectiveness cannot be reached. But in our experience, we have observed that service providers are much more likely to agree to a phased but randomized intervention. Therefore, although the stepped wedge may not be the most appealing design (we argue however later that is still an efficient design, likely to be robust, and perfectly appropriate in many settings), it does permit a randomized evaluation that otherwise would not be feasible.Kotz et al. also argue that the design requires multiple repeat measures, which may be burdensome. This will be true some of the time, but not in others. For example, for an intervention implemented at ward level, but where a different cohort of patients are observed during each period. In this example, each patient only has one observation taken.Kotz et al. also describe the stepped wedge design in a way that seems to suggest that at each and every step data must be collected. Stepped wedge designs can be more flexible than this, and it is perfectly possible to have designs in which data are not collected from every cluster at every step: such as omitting a period preceding the change in treatment to allow the follow-up period to be completed; or only collecting data on periods within a fixed number before and after the change in treatment.Finally, Kotz et al. make the point that the increase in power of stepped wedge studies is brought about by increase in number of observations and not by any increase in efficiency. We believe that this is a more subtle point. In some cases, the within-cluster information on treatment effect, which is not available in a cluster RCT, provides a substantial improvement in efficiency. This we believe will be particularly true for high intracluster correlation coefficient (ICC) values as the within-cluster information is not affected by the between-cluster variability that the ICC represents. Kotz et al. implicitly suggest that the number of clusters is the primary consideration in trial design, which is certainly true of cluster RCTs from a statistical perspective. However, the overall number of patients can be the primary consideration in terms of cost and ethics. Trials where the number of clusters is fixed, for example, the number of hospitals in a given geographical area, but the number of patients per cluster is flexible, may provide contexts where the step wedge trial is more efficient than a cluster RCT.For example, suppose we consider a trial designed to detect a standardized effect size of 0.2, at 5% significance and ICC of 0.02 with just 20 clusters available. Then using the Stata function, clustersamps [[2]Hemming K. Marsh J.L. A menu-driven facility for sample size calculations in cluster randomized controlled trials.Stata J. 2013; 13: 114-135Google Scholar] to compute power under the parallel cluster design, and the formula presented elsewhere to compute power under the stepped wedge design [[3]Hussey M.A. Hughes J.P. Design and analysis of stepped wedge cluster randomized trials.Contemp Clin Trials. 2007; 28: 182-191Abstract Full Text Full Text PDF PubMed Scopus (760) Google Scholar], we show that the stepped wedge trial provides greater power at the same total number of observations (Table 1).Table 1Power available to detect a standardized effect size of 0.2, with 20 clusters available and an ICC of 0.02Stepped wedge designCluster designNumber of clusters in total2020Cluster size5105Number of steps200Total number of measurements2,1002,100Power (%)85.069.8Abbreviation: ICC, intracluster correlation coefficient. Open table in a new tab Problems arising from lack of concealment or lack of blinding do pose a potential threat. Indeed in many early cluster trials where these issues were not fully understood, some studies were shown to be at risk of bias [[4]Eldridge S. Kerry S. Torgerson D.J. Bias in identifying and recruiting participants in cluster randomised trials: what can be done?.BMJ. 2009; 339: b4006Crossref PubMed Scopus (91) Google Scholar]. In stepped wedge studies, similar biases may be possible and knowledge around the potential and ways to mitigate these issues is not well researched. The stepped wedge design is a new type of randomized controlled trial (RCT), which is increasingly being used, especially in the evaluation of service delivery interventions (SDIs). Kotz et al. [[1]Kotz D. Spigt M. Arts I.C. Crutzen R. Viechtbauer W. Use of the stepped wedge design cannot be recommended: a critical appraisal and comparison with the classic cluster randomized controlled trial design.J Clin Epidemiol. 2012; 65: 1249-1252Abstract Full Text Full Text PDF PubMed Scopus (58) Google Scholar] have recently criticized a number of the advantages that are often cited as being reasons why the stepped wedge design is so appealing. One trial design is never going to be appropriate in all evaluations. Although we believe that many of the points raised by Kotz et al. are valid, the recommendation that the stepped wedge design cannot be recommended in practice is very short sighted. In many settings, particularly in the area of SDIs, service providers often implement new interventions without full evidence of effectiveness. It is often argued by the service providers themselves that the intervention is highly likely to be effective. For this and other pragmatic reasons in these settings, an agreement to randomize to evaluate effectiveness cannot be reached. But in our experience, we have observed that service providers are much more likely to agree to a phased but randomized intervention. Therefore, although the stepped wedge may not be the most appealing design (we argue however later that is still an efficient design, likely to be robust, and perfectly appropriate in many settings), it does permit a randomized evaluation that otherwise would not be feasible. Kotz et al. also argue that the design requires multiple repeat measures, which may be burdensome. This will be true some of the time, but not in others. For example, for an intervention implemented at ward level, but where a different cohort of patients are observed during each period. In this example, each patient only has one observation taken. Kotz et al. also describe the stepped wedge design in a way that seems to suggest that at each and every step data must be collected. Stepped wedge designs can be more flexible than this, and it is perfectly possible to have designs in which data are not collected from every cluster at every step: such as omitting a period preceding the change in treatment to allow the follow-up period to be completed; or only collecting data on periods within a fixed number before and after the change in treatment. Finally, Kotz et al. make the point that the increase in power of stepped wedge studies is brought about by increase in number of observations and not by any increase in efficiency. We believe that this is a more subtle point. In some cases, the within-cluster information on treatment effect, which is not available in a cluster RCT, provides a substantial improvement in efficiency. This we believe will be particularly true for high intracluster correlation coefficient (ICC) values as the within-cluster information is not affected by the between-cluster variability that the ICC represents. Kotz et al. implicitly suggest that the number of clusters is the primary consideration in trial design, which is certainly true of cluster RCTs from a statistical perspective. However, the overall number of patients can be the primary consideration in terms of cost and ethics. Trials where the number of clusters is fixed, for example, the number of hospitals in a given geographical area, but the number of patients per cluster is flexible, may provide contexts where the step wedge trial is more efficient than a cluster RCT. For example, suppose we consider a trial designed to detect a standardized effect size of 0.2, at 5% significance and ICC of 0.02 with just 20 clusters available. Then using the Stata function, clustersamps [[2]Hemming K. Marsh J.L. A menu-driven facility for sample size calculations in cluster randomized controlled trials.Stata J. 2013; 13: 114-135Google Scholar] to compute power under the parallel cluster design, and the formula presented elsewhere to compute power under the stepped wedge design [[3]Hussey M.A. Hughes J.P. Design and analysis of stepped wedge cluster randomized trials.Contemp Clin Trials. 2007; 28: 182-191Abstract Full Text Full Text PDF PubMed Scopus (760) Google Scholar], we show that the stepped wedge trial provides greater power at the same total number of observations (Table 1). Abbreviation: ICC, intracluster correlation coefficient. Problems arising from lack of concealment or lack of blinding do pose a potential threat. Indeed in many early cluster trials where these issues were not fully understood, some studies were shown to be at risk of bias [[4]Eldridge S. Kerry S. Torgerson D.J. Bias in identifying and recruiting participants in cluster randomised trials: what can be done?.BMJ. 2009; 339: b4006Crossref PubMed Scopus (91) Google Scholar]. In stepped wedge studies, similar biases may be possible and knowledge around the potential and ways to mitigate these issues is not well researched.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call