Design and analysis of efficacy evaluation trials

Joe N. Perry

doi:10.1111/j.1365-2338.2007.01068.x

Abstract

EPPO BulletinVolume 37, Issue 1 p. 11-24 Free Access Design and analysis of efficacy evaluation trials First published: 18 April 2007 https://doi.org/10.1111/j.1365-2338.2007.01068.xCitations: 3 European and Mediterranean Plant Protection Organization Organisation Européenne et Méditerranéenne pour la Protection des Plantes AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat PP 1/152 (3) Specific scope This standard is intended for use in association with EPPO Standards of set PP 1 (Standards for the efficacy evaluation of plant protection products) and provides detailed advice on the design and analysis of efficacy evaluation trials. Specific approval and amendment First approved in 1989-09. First revision approved in 1998-09. Second revision approved in 2006-09. Introduction This standard is intended to provide general background information on the design and analysis of efficacy evaluation trials. The EPPO Standards for the efficacy evaluation of plant protection products provide more detailed instructions on such trials for individual host/pest combinations. The set-up of a trial is first considered (experimental design, plot size and layout, role and location of untreated controls). The nature of observations to be made is then reviewed (types of variables, modes of observation). Finally, suggestions are made on the statistical analysis of the results of a trial and of a trial series (estimates of effects, choice of the statistical test, transformation of variables). Appendix 1 gives examples of scales used in the EPPO standards. What follows is intended to give an outline of good statistical practice in the analysis of data. It is not, and cannot be, a prescription for all analyses, or to cover all situations. Practitioners should never underestimate the need for professional statistical advice. It is important for practitioners to understand the advice they receive and it is often better for them to perform a simple analysis that they can report and defend with confidence than to accept advice that leads to an analysis that they may understand only partially. The bibliography at the end of these standards may be helpful. It gives several good texts that attempt to reveal the principles of good statistical practice, rather than to provide a series of statistical recipes to be followed blindly. 1. Experimental design 1.1 Experimental scope and objectives Before the design of a trial is considered, its scope and objectives should be defined clearly, because these constrain the available choices of design. In practice, an iterative process is often used: scope and objectives are gradually adjusted to fit the experimental resources available. It is vital that the scope and objectives be updated to reflect decisions made during this process. The scope of the trial reflects the range of practical outcomes that may result from the trial and which are relevant to its objectives. Part of the scope relates to the population which the trial is sampling. Another part determines the range of environmental conditions, crops, treatment chemicals, application methods and target pests which the trial is intended to test. The scope defines the context in which the experimental units and observations are studied. The objectives of the trial should be in the form of questions about the treatments to which answers are desired. Typical answers will be ‘yes’ or ‘no’, a ranking of treatments or an estimate of a value. The scope and objectives should form part of the trial protocol, as described in EPPO Standard PP 1/181 Conduct and reporting of efficacy evaluation trials, including good experimental practice. The planned experimental methods, design and analysis described below should also form part of the protocol. 1.2 Types of design EPPO Standards for the efficacy evaluation of plant protection products envisage trials in which the experimental treatments are the ‘test product(s), reference product(s) and untreated control, arranged in a suitable statistical design’. It is also envisaged that the products may be tested at different doses and/or application times. This applies particularly to the use of a higher dose in selectivity trials and dose–response studies in general. Mono-factorial designs are appropriate for trials if the test product(s), reference product(s) and untreated control can be considered as different levels of a single factor, and if there are no other factors that require study. However, if, for example, the effect of each product in an efficacy trial is to be studied at different doses, then a factorial design may be used with, in general, all possible combinations of treatments from both factors represented. In this way, important interactions between the factors may be revealed and estimated. The principal randomized designs which are likely to be used are: completely randomized and randomized complete block. These are illustrated below on the basis of a mono-factorial example with eight treatments, i.e. five different test products, two reference products and an untreated control; each treatment is replicated four times. 1.2.1 Completely randomized design The treatments in a completely randomized design (Fig. 1) are assigned at random to the experimental unit. This design is potentially the most powerful statistically (in the sense that there is a maximum chance of detecting a significant difference if it exists), because it allows retention of the maximum number of degrees of freedom for the residual variance. However, it is suitable only if the trial area is known to offer a homogeneous environment. If there is considerable heterogeneity between different parts of the trial area, residual variance may become unacceptably high, and it is better to use a design that accounts for this, such as a randomized complete block. Figure 1Open in figure viewerPowerPoint A fully randomized design. Each treatment (labelled 1–8) is replicated four times; individual treatment labels are assigned completely randomly to the 32 plots. 1.2.2 Randomized complete block design A block is a group of plots within which the environment relevant to the observations to be made is homogeneous. In this design, the blocks are laid out deliberately so that plots within them are as uniform as possible before application of treatments. Usually, each treatment appears once and once only, within each block. The treatments are distributed randomly to the plots within the blocks, which act as replicates. The arrangement of treatments in each block should be randomized separately for each block. In the following examples (2-4), there are four blocks and eight treatments. The layout of the blocks aims to control the heterogeneity of the site (e.g. slope, direction of work at sowing or planting, exposure, degree of infestation, etc.), plants (size, age, vigour) or of the conditions occurring during the experiment (application of treatments, assessments). The layout of the blocks therefore requires some preliminary knowledge of the trial area. The arrangement of plots within blocks may be influenced by plot shape: long narrow plots are often arranged side-by-side, whereas, square plots may be laid out in other ways. However, blocks do not have to be placed side by side. If there is good preliminary knowledge of a field, this may be utilized by scattering blocks across the field, to account for previously observed heterogeneity (5, 6). Although it is quite possible that in a randomized layout, treatments within a replicate may appear in treatment order, this is to be avoided wherever possible in the interests of unbiased evaluations. If there is extremely good preliminary knowledge, and it can be confidently assumed that conditions will remain the same for the experiment to be done, complex heterogeneity may be allowed for, and it is not even necessary for plots of the same block to be adjacent. For example, blocks may be broken up to account for a known patchy infestation of nematodes. In Fig. 6, plots within block 1 have been deliberately placed at points of visibly low infestation and plots within block 2 at points of visibly high infestation. Figure 2Open in figure viewerPowerPoint Possible arrangement of blocks and plots in randomized blocks in field trials. An environmental gradient down the field is accounted for, either by arranging blocks down the gradient, or by placing blocks side by side. In each case, plots within blocks placed across the gradient are affected equally by the environmental variable. Figure 3Open in figure viewerPowerPoint Possible arrangement of blocks and plots in randomized blocks in field trials. An alternative form of randomized block design for the situation when there is no obvious environmental gradient, but where heterogeneity is to be suspected because the maximum distance between plots within a block is relatively large. Here, the eight plots are arranged relatively close together in a 4 × 2 rectangle, and the blocks are placed side by side. Figure 4Open in figure viewerPowerPoint Another example of an arrangement for blocks and plots when, as in Figure 3, heterogeneity is suspected but there is no obvious environmental gradient. Here, the eight plots are again arranged relatively close together in a 4 × 2 rectangle, but the blocks themselves are arranged in a 2 × 2 grid. Figure 5Open in figure viewerPowerPoint Possible arrangement of blocks and plots in randomized blocks in field trials. Blocks scattered across the field, according to previously observed heterogeneity. Figure 6Open in figure viewerPowerPoint Possible arrangement of blocks and plots in randomized blocks in field trials. Blocks scattered across the field, according to complex, previously observed heterogeneity. Of course, the choice of design and the dimensions and orientation of the blocks used, if any, depend on the heterogeneity perceived in the trial area (e.g. for soil, slope, exposure, pest infestation, cultivar, etc.). Such variables are never entirely uniform, and a randomized block design in a moderately heterogeneous area will usually give more useful information on product performance than a fully randomized trial in an area thought to be homogeneous, but which subsequently transpires not to be. Block layout will also depend on plot size and shape (5, 6). In general, smaller blocks are more effective in reducing heterogeneity. In trials with a large number of treatments other designs should be considered (e.g. lattice designs, incomplete block designs). Randomized block trials carried out in different regions with distinct environmental conditions and/or in different years may in appropriate cases be considered as a trial series. In the statistical analysis it is then necessary to separate the additional between-sites variance from the variance between blocks, and also to estimate a site × treatment interaction, which may be of particular interest. Note that, in each separate trial, the treatments should be randomized anew within each block. 1.2.3 Split plot design When a multifactorial trial is carried out, then the usual design is a randomized complete block, with each treatment combination occurring once in each block. However, sometimes one of the factors cannot be randomized fully to the plots in a block. For example, suppose a trial had two factors: product (with four levels, labelled 1–4) and cultivation equipment (with three levels, labelled A, B, C) and that plots were relatively small. Then the size of the machinery to apply the cultivation treatment may preclude full randomization over the 12 plots in each block. In that case, a split-plot design is recommended, where, in each block, subplots are associated together in groups of four to form three whole plots per block, the factor cultivation is randomized to these whole plots, and the factor product is randomized, separately, to subplots within whole plots (Fig. 7). With a split-plot design, a slightly more complex analysis of variance is required, in which there are two strata, each having a separate error mean square, against which to test the effect of the different factors and their interaction. Figure 7Open in figure viewerPowerPoint An example of a split-plot design. The two treatment factors are: product (1, 2, 3, 4, randomized to subplots within whole plots) and cultivation method (A, B, C, randomized to whole plots within each of the two blocks). 1.2.4 Systematic designs Non-randomized, systematic designs are never suitable for efficacy evaluation purposes, except in some very special cases (e.g. varietal trials on herbicide selectivity). In general, they are only suitable for demonstration trials. 1.3 Power of the trial In planning experiments, it is important to consider the power required for any statistical tests that are to be performed. The power is the probability of detecting a given difference between treatments if this difference exists. The power depends on a number of parameters, among which are: • the precision of the results (residual variation) • number of replicates, including any replication over sites. A design should be chosen which gives a good chance of detecting, with statistical significance, a difference which is of practical importance for the contrast in which one is interested. One may also have the related requirement that confidence intervals on treatment estimates should be no more than some predetermined width. Before the trial is started, the choice should be made between the performance of a single trial or of a trial series. According to EPPO Standard PP 1/226 Number of efficacy trials the performance of a plant protection product should be demonstrated by conducting a number of trials in different sites, regions and years under distinct environmental conditions. Therefore to study the performance of a plant protection product a trial series may also be planned, conducted and analyzed (see also 3.4.1 for a definition of a trial series). In general, there may be results from previous experiments to indicate the likely variability of observations. If such data exists, it is possible to make some judgement as to the design and size of the experiment needed to give the required power. Sometimes it is possible from theoretical considerations to determine the numbers required. For example, with binomial data, an upper limit can be put on the variability of proportions. Various computer-based or graphical systems are available to assist in determining the number of replicates needed. These use the magnitude of the difference required to be estimated, or the level of significance required for that difference, and the precision expected. Some simple general rules are indicated in the next section. 1.4 Number of treatments and replicates in relation to degrees of freedom For a useful statistical analysis to be made, the number of residual degrees of freedom (d.f.) should be sufficiently large. In a trial with 8 treatments and 4 replicates with a randomized block design, there are 21 residual d.f. These are calculated as: total d.f. (32 − 1 = 31) minus treatment d.f. (8 − 1 = 7) minus blocks d.f. (4 − 1 = 3), i.e. 31 − 7 − 3 = 21. In a trial with 3 treatments and 4 replicates repeated at 4 sites, there are 24 residual d.f. These are calculated as: total d.f. (48 − 1 = 47) minus treatment d.f. (3 − 1 = 2) minus sites d.f. (4 − 1 = 3) minus interaction treatment by site d.f. ((3 − 1)*(4 − 1) = 6) minus replicate d.f. over sites ((4 − 1)*4 = 12), i.e. 47 − 2 − 3 − 6 − 12 = 24. Residual d.f. should be increased by increasing the replication, the treatments or the number of sites. The desired number of residual d.f. depends on the degree of precision (power) required of the trial. Expert statistical advice should be sought if in doubt. In general, experience with trials/trial series on efficacy evaluation has shown that one should not lay out trials/trial series with less than 12 residual d.f. If for any relevant reasons it is advisable to use only 3 replicates and 3 treatments, then the trial may be executed on at least 4 sites to get the minimum residual degrees of freedom of 15 required for a useful statistical analysis. The choice of the experimental design also has an influence on the number of residual d.f. The fully randomized design gives the maximum number. The randomized block design uses some of these d.f. to allow for the heterogeneity of the environment (such as that along one gradient). The split-plot design uses d.f. to allow for the possible sources of more than one component of variation. The experimenter should try to leave the maximum number of d.f. to estimate the residual variation, while choosing an optimal design to minimize that variation, by allowing for all the known sources of heterogeneity (see EPPO Standard PP 1/181). The relationship between the number of replicates and the residual degrees of freedom for differing number of treatments and sites can be extracted from Table 1. Table 1. Residual degrees of freedom in relation to number of sites, treatments and replicates in a site Sites 1 4 6 Replicates 3 4 5 6 7 8 3 4 5 6 7 8 3 4 5 6 7 8 Treatments 3 4 6 8 10 12 14 16 24 32 40 48 56 24 36 48 60 72 84 4 6 9 12 15 18 21 24 36 48 60 72 84 36 54 72 90 108 126 5 8 12 16 20 24 28 32 48 64 80 96 112 48 72 96 120 144 168 6 10 15 20 25 30 35 40 60 80 100 120 140 60 90 120 150 180 210 7 12 18 24 30 36 42 48 72 96 120 144 168 72 108 144 180 216 252 8 14 21 28 35 42 49 56 84 112 140 168 196 84 126 168 210 252 294 1.5 Experimental units/plots: size, shape, need for borders The experimental unit is that part of the trial material to which a single treatment is applied and on which observations are made. Sufficient units are necessary for the planned treatments and replications. In practice, trial material is limited and compromises may often be necessary. Examples of experimental units are: an area of crop (plot), a container of one or more plants, a single plant, a part of a plant (e.g. leaf, stem, branch) and a baiting point in a field. The experimental units should be chosen to be representative of the population the trial is testing and to be as uniform as possible. Lack of uniformity can sometimes be mitigated with replicate blocks. In general, plots should be rectangular and of the same size in one trial and of similar size for a single trial series. Accuracy increases with plot size, but only up to a certain limit, for variability in soil and infestation conditions also tends to increase. Long thin rectangular plots are suitable for mechanical harvesting. Nearly square plots reduce the risk of interference between plots. For observations of spatially aggregated pests, such as some weeds and soil-borne diseases, more smaller plots are better than fewer larger plots. Plot size is given in specific EPPO standards for particular crop/pest combinations. In cases where interference between plots is liable to occur, the plots will be larger (gross plot) and the observations will be limited to the central area (net plot). The difference between the net plot and the gross plot is called a discard area. In general, the EPPO standards suggest net plot sizes, and the gross plot size is usually left to the experimenter, who should determine the discard areas necessary by considering all the potential sources of interference between plots in each trial or trial series. One common source of interference is spread of the product (for example spray or vapour drift, or lateral movement on/in soil) outside the plot to contaminate adjacent plots. This can be particularly important for sprays applied to tall crops. However, with greater discard areas, the experimental error can often be minimized. Another common source of interference is spread of the pest (for example air-borne fungi or highly mobile insects) from untreated plots or from plots where control of the pest is poorer. Such spread can both increase the pest population in plots with more efficacious treatments and decrease it in plots with less good ones. Similarly, if a product is being tested in a crop where integrated control is practised, adverse effects on predators and parasites may be masked by their migration between plots. A further source of interference is competition for light and nutrients. This is particularly relevant if yield is to be measured. If guard areas between plots are different from the plots themselves (e.g. bare paths, a different crop), caution must be exercised when selecting the area for assessment. According to the application or harvesting equipment used, net plot size may need to be increased above that needed for observations. Plots may be laid out across or along the direction of work (sowing or planting). The crosswise layout (Fig. 8) has the advantage that, if some mistake is made in the work (cultivation, sowing, etc.), all plots in a block will probably be equally affected. However, then treatment and harvesting become more difficult. The lengthwise layout offers practical advantages for treatment and harvesting, but runs the risk of greater heterogeneity along very long blocks. The hybrid layout may provide a compromise. Figure 8Open in figure viewerPowerPoint Similar randomized block designs, but with different layout of plots relative to the direction of work. 1.6 Role and location of untreated controls 1.6.1 Purpose of the untreated control The main feature of ‘untreated controls’ is that they have not been subjected to any of the plant protection treatments under study. Untreated controls should, however, receive all the measures which are uniformly applied throughout the trial, in particular cultural measures and applications against pests not being studied. Though the untreated control normally receives no treatment at all against the pest being studied, in certain circumstances it may be useful to modify the untreated control to include certain operations received by the other treatments. For example, where the other treatments receive the products in aqueous solution through the passage of spray machinery over the plot, the untreated control may be modified to include a passage of spray equipment, but with water alone. The idea is to replicate, as far as possible, the operations of the other treatments, with the exception only of the application of the product itself. The main purpose of the untreated control is to demonstrate the presence of an adequate pest infestation. For example, unless an untreated control has confirmed the presence of an adequate pest infestation, efficacy cannot be demonstrated and results are then not meaningful. This confirmation may be qualitative (presence of dominant species, type of flora, weeds, etc.) or quantitative (compliance with minimum and maximum thresholds, spatial distribution). Under exceptional circumstances, an untreated control may not be possible (e.g. for quarantine pests). Depending on the objective and the type of experiment, untreated controls play a useful role, and possibly several roles at the same time. Among them are: • showing the efficacy of a new product and the reference product. The primary proof of the efficacy of a new or reference product is always obtained from a comparison with the untreated control • assistance in making observations. A visual estimation of damage or infestation may sometimes be done in relative terms, by comparison with a control • use of the technique of the ‘adjacent control’ to measure and take account of spatial distribution in the plots • observation of the development of the pest (emergence, flight, spore release, etc.), in particular as a basis for determining dates for application or observation • provision of a reserve of inoculum in order to ensure that inoculum level does not fall too far or become too heterogeneous (in extreme cases, this may be practically equivalent to artificial infestation) • assistance in interpreting the results of trials. For example, a significant difference between two treatments may not have the same importance depending on the level of infestation • making the results of the analysis more accessible for users by expressing them in a different form, or by allowing their graphical representation (e.g. transformation of mortality into efficacy rate) • allowing for additional observations, in particular quantitative or qualitative yield, which it may be interesting to link with the other results of the trial • finally, and exceptionally, formation of a comparison term for the treatments under study if no reference product is available. This may occur, for example, when the type of product or its use are new or when all potential reference products have been withdrawn from use. This role is then similar to the role of the reference product, although its interpretation is very different. Controls may then be compared with the various treatments using formal statistical significance tests, in the same way as the reference product is compared with them in usual trials. 1.6.2 Types of arrangements of untreated controls Four types of arrangements of the control are possible. Included controls: the controls are considered like any other treatment, the control plots are the same shape and size as the other plots, and the controls are randomized in the trial. The included control is the most usual way to carry out trials and all other versions are used exceptionally (mostly in herbicide testing). Imbricated controls: the control plots are arranged systematically in the trial. Plot size and shape need not be the same as for other plots in the trial. The observations made in these plots are of a different nature and should not be included in the statistical analysis. The purpose of the arrangement is to ensure a more homogeneous distribution of the effect of an adjacent untreated area than is possible with the included randomized design. Various arrangements are possible; the plots may be placed between blocks or between treated plots within blocks (Fig. 9). Figure 9Open in figure viewerPowerPoint An example of the use of imbricated control plots for a randomized block trial with four blocks and four treatments. Excluded controls: control plots are selected outside the trial area and not adjacent to it, in an area with conditions closely similar to those of the trial. Replication is not essential but may be useful if the area is not homogeneous. The observations made in these plots should not be included in the statistical analysis. Adjacent controls: each plot is divided equally into two subplots, and one of these (at random) is left untreated. Observations are made in the same way in both subplots. The observations made in these plots should not be included in the statistical analysis unless due allowance is taken of the fact that the design is a form of split-plot. In a split-plot design, the variability within plots may differ from that between plots; consequently the analysis of variance should include two strata of error. Specialist statistical advice may be necessary to interpret the results. 1.6.3 Choice of the type of arrangement of untreated control The choice of the arrangement of the untreated control depends on its role(s) in the trial. Although the included control has very often been used in the past in efficacy evaluation trials, and is still frequently used in practice, it is not necessarily the most suitable. The following decision scheme gives guidance. (a) If the control is used in a statistical test, then the ‘included control’ is essential. If not, another type of control can be used. In either case the heterogeneity of the plots should be considered (b) If heterogeneity is high, the ‘adjacent control’ is suitable. If heterogeneity is low or moderate, the interference of the control plots with the adjacent plots should then be considered (c) If the control plots are not liable to interfere with adjacent plots, then the ‘imbricated control’ is suitable (d) If control plots are liable to interfere with adjacent plots, then the ‘excluded control’ should be used. 1.7 Selection of the sample size in a plot The main purpose of taking several samples inside a plot is to reduce the variability of the estimated plot mean to a suitable level for the assessed variable. The sample size should be large enough to achieve this purpose. The sample size required depends greatly on the nature of the observation and the variability within the plot. EPPO standards on the assessment of specific pests, weeds and diseases give advice on sample sizes. In practice, sample sizes of 10–50 elements are usually enough to accomplish the goal of correct estimation of the mean value in a plot, depending on the inherent variability. Note that, if the treatments are applied to plots, then increasing the sample size within plots only gives a strictly limited return of efficiency, because between-treatment comparisons should be made at between-plot scale. Sampling should always be random and should adequately cover the area of the plot and the experimental material. For practical reasons, subsampling may be necessary. A review of sampling methods and references to the literature may b

Full Text