Experiment design to achieve desired statistical power

A J Vandenbygaart,O B Allen

doi:10.4141/cjss2010-068

Abstract

The comparison of allocated treatments is a foundation for many studies in soil science. We often want to determine if a treatment applied to a soil has a significant effect compared with not applying that treatment (control). Often, a statistical test is performed to establish whether there is a true difference in the means of the treatment and control. The null hypothesis states that there is no difference in the means. There are two types of error that can be made when a statistical test is conducted. These are presented in Table 1. In soil science research, there is an overwhelming focus on controlling the Type I error (i.e., the value of a). When a comparison test yields a non-significant result, the usual reaction is to assume that there was no true affect of applying the treatment to the soil. What is often ignored is the risk of Type II error; that is, a true difference is not detected. In this case, the power was not sufficient to be able to detect the difference that existed. Power is defined as the probability of rejecting the null hypothesis. This probability depends on the magnitude of the true difference in the treatment means. A larger difference is easier to detect and will thus have higher power. A Type II error occurs when there is a failure to reject a false null hypothesis, and the risk of committing this error is higher when the number of treatment replicates is low (Barker Bausell and Li 2002). Soils are highly, spatially variable at multiple scales both laterally and vertically. There are also analytical errors which can impose variation in the laboratory. The number of treatment replicates must be sufficiently large so that the effect of the treatment can over-ride this inherent variability. Without sufficient replication to lower the variation, the null hypothesis will not be rejected. An example of a research area in soil science where Type II error is common is in the assessment of management effects on soil organic carbon content (SOC). Typically, long-term agroecosystem experiments (LTAEs) have been used to assess management effects on SOC (e.g., Janzen et al. 1998; VandenBygaart et al. 2011). The LTAEs were not usually initiated for the purposes of assessing SOC differences, but for agronomic purposes. They are typically randomized block designs with relatively small plot sizes, such that blocking effects can yield large variations in SOC between replicated plots (Ellert et al. 2007). Expected changes in SOC due to management practices can be small, and often require decades to detect (Janzen et al. 1998). This small effect size coupled with large betweenreplicated plot variation causes the power to decrease, and increases the chances of Type II error. Indeed, in some instances there can be significant effects of a certain management practice on SOC with low power to detect a difference in LTAEs. Yet, where there is not a difference detected between treatments, the researcher may have been inclined to accept that there was not a difference and thus be done with it. However, by reaching such a conclusion without conducting adequate statistical power analysis, there is a risk of not observing a difference that was actually present. Recently, there has been a concern that such errors can lead to erroneous interpretations of data, which can have implications on agricultural policy (e.g., VandenBygaart 2009). Kravchenko and Robertson (2011) highlight some examples of the recent literature where lack of statistical power can lead to erroneous interpretation of results. There are several approaches that soil scientists can take to ensure that Type II errors are reduced to acceptable levels. The first is to effectively control experimental variation amongst plots. This is achieved by effective blocking in the design of the experiment and by the use of covariates in the statistical analysis to control variation not removed by blocking. The second step is to determine the required number of replicate plots needed to ensure that power is acceptably high, after having reduced the experimental error variance to a minimum through effective blocking and covariate analysis. One may also exploit ‘‘hidden replication’’, through the use of factorial experiments, to increase power (e.g. Astatkie et al. 2006). Blocking is very widely used in field experiments. Commonly used blocked designs include both complete and incomplete blocks, Latin squares and split plots. Many excellent references exist to guide the choice of experimental design (for example, Kuehl 2000). Suffice to say that the goal in laying out blocks is always to minimize the variation amongst plots within a block and hence to maximize the variation amongst blocks, prior to applying the treatments. Of course, the plots within a

Full Text