Cluster randomised trials with different numbers of measurements at baseline and endline: Sample size and optimal allocation.

Andrew J Copas,Richard Hooper

doi:10.1177/1740774519873888

Abstract

Published methods for sample size calculation for cluster randomised trials with baseline data are inflexible and primarily assume an equal amount of data collected at baseline and endline, that is, before and after the intervention has been implemented in some clusters. We extend these methods to any amount of baseline and endline data. We explain how to explore sample size for a trial if some baseline data from the trial clusters have already been collected as part of a separate study. Where such data aren't available, we show how to choose the proportion of data collection devoted to the baseline within the trial, when a particular cluster size or range of cluster sizes is proposed. We provide a design effect given the cluster size and correlation parameters, assuming different participants are assessed at baseline and endline in the same clusters. We show how to produce plots to identify the impact of varying the amount of baseline data accounting for the inevitable uncertainty in the cluster autocorrelation. We illustrate the methodology using an example trial. Baseline data provide more power, or allow a greater reduction in trial size, with greater values of the cluster size, intracluster correlation and cluster autocorrelation. Investigators should think carefully before collecting baseline data in a cluster randomised trial if this is at the expense of endline data. In some scenarios, this will increase the sample size required to achieve given power and precision.

Full Text