An important concept required for understanding formal experimental design is that of degrees-of-freedom (DoFs). Degrees-of-freedom are used in different contexts throughout science, for example, we have encountered these when discussing statistical distributions such as the F statistic1 and they have an important role in statistical mechanics. In this article, we will focus on their role in statistical design. A first job in analysing a designed experiment is to determine the number of degrees-of-freedom attached to different sources of error. We can see that for model B, all residual errors appear to be equal to 0. This is because we have not performed enough experiments to see whether the data is truly linear. There are two experiments and two terms in the model. For model A, we can calculate residual errors. The interested reader will note that these errors do not sum to 0: This will only happen if the intercept term b0 is included in a model or the data are centred. In most chemometric models, we do include the intercept and so assume that the sum of residual errors is always 0. Models do not need to be restricted to linear terms, for example, a series of 3 observations can be used to obtain a linear model with no intercept with D = 2 degrees-of-freedom for lack-of-fit, a linear model with an intercept with D = 1, or a model including intercept, linear, and quadratic terms with D = 0. If we really wanted to see whether there was curvature in a univariate relationship, it would be advisable to perform at least 6 or 7 unique experiments, so that there are 3 or 4 degrees-of-freedom for the lack-of-fit error. There are three factors. A common model as discussed previously2 is Sometimes, the degrees-of-freedom can be represented by a “degrees-of-freedom tree.” In our case, it is represented in Figure 2 and is a good way of summarising a design. Sometimes, degrees-of-freedom trees can be more elaborate, for example, if errors are viewed as coming from different sources. The lack-of-fit error tells us only if the overall model is a sensible one, and we may, for example, be interested in whether a specific term or group of terms in the model is significant. However, as a first step for visualising most common designs encountered in chemometrics, the analysis in this article is a common and valuable first step. The residual errors between the estimated and observed values of the response in the design of Table 2 therefore have 10 degrees-of-freedom. Similar lack-of-fit and replicate errors can be calculated, as described in subsequent articles. If there are no replicates, the lack-of-fit error is the same as the residual error. Every type of error has its own degrees-of-freedom associated with it. Sometimes, the mean squared errors are also called variances as discussed in later articles. Note that if we changed the model, for example, by removing the three quadratic terms, we will increase the number of degrees-of-freedom for the lack-of-fit and reduce the number of degrees-of-freedom for regression. If very little is known about whether a model is suitable for a particular set of experiments, a good rule of thumb is to make the number of degrees-of-freedom for the lack-of-fit and replicate errors approximately equal for the desired model. If, however, we are pretty certain that a response behaves in a certain manner, eg, we may be sure that an experiment results in a linear response, we might reduce or even remove replication, and focus on unique experiments to obtain as accurate a numerical relationship as possible. For further reading, Stan Deming and Steve Morgan wrote several articles and books3 about degrees-of-freedom trees in the 1970s to 1990s.
Read full abstract