Stepped-wedge cluster randomized trials tend to require fewer clusters than standard parallel-arm designs due to the switches between control and intervention conditions, but there are no recommendations for the minimum number of clusters. Trials randomizing an extremely small number of clusters are not uncommon, but the justification for small numbers of clusters is often unclear and appropriate analysis is often lacking. In addition, stepped-wedge cluster randomized trials are methodologically more complex due to their longitudinal correlation structure, and ignoring the distinct within- and between-period intracluster correlations can underestimate the sample size in small stepped-wedge cluster randomized trials. We conducted a review of published small stepped-wedge cluster randomized trials to understand how and why they are used, and to characterize approaches used in their design and analysis. Electronic searches were used to identify primary reports of full-scale stepped-wedge cluster randomized trials published during the period 2016-2022; the subset that randomized two to six clusters was identified. Two reviewers independently extracted information from each report and any available protocol. Disagreements were resolved through discussion. We identified 61 stepped-wedge cluster randomized trials that randomized two to six clusters: median sample size (Q1-Q3) 1426 (420-7553) participants. Twelve (19.7%) gave some indication that the evaluation was considered a "preliminary" evaluation and 16 (26.2%) recognized the small number of clusters as a limitation. Sixteen (26.2%) provided an explanation for the limited number of clusters: the need to minimize contamination (e.g. by merging adjacent units), limited availability of clusters, and logistical considerations were common explanations. Majority (51, 83.6%) presented sample size or power calculations, but only one assumed distinct within- and between-period intracluster correlations. Few (10, 16.4%) utilized restricted randomization methods; more than half (34, 55.7%) identified baseline imbalances. The most common statistical method for analysis was the generalized linear mixed model (44, 72.1%). Only four trials (6.6%) reported statistical analyses considering small numbers of clusters: one used generalized estimating equations with small-sample correction, two used generalized linear mixed model with small-sample correction, and one used Bayesian analysis. Another eight (13.1%) used fixed-effects regression, the performance of which requires further evaluation under stepped-wedge cluster randomized trials with small numbers of clusters. None used permutation tests or cluster-period level analysis. Methods appropriate for the design and analysis of small stepped-wedge cluster randomized trials have not been widely adopted in practice. Greater awareness is required that the use of standard sample size calculation methods can provide spuriously low numbers of required clusters. Methods such as generalized estimating equations or generalized linear mixed models with small-sample corrections, Bayesian approaches, and permutation tests may be more appropriate for the analysis of small stepped-wedge cluster randomized trials. Future research is needed to establish best practices for stepped-wedge cluster randomized trials with a small number of clusters.
Read full abstract