Abstract Because they require very little storage and can be computationally quite efficient, gradient algorithms are attractive methods for fitting large nonorthogonal analysis of variance (ANOVA) models. A coordinate-free approach is used to provide very simple definitions for a number of well-known gradient algorithms and insights into their similarities and differences. The key to finding a good algorithm is finding an algorithm metric that leads to easily computed gradients and that is as close as possible to the metric defined by the ANOVA problem. This leads to the proposal of a new class of algorithms based on a proportional subclass metric. Several new theoretical results on convergence are derived, and some empirical comparisons are made. A similar, but much briefer, treatment of analysis of covariance is given. On theoretical convergence of the methods it is shown, for example, that the Golub and Nash (1982) algorithm requires at most d + 1 iterations if all but d of the cells in the model have the same cell count, that the proportional subclass algorithm converges in one step for proportional subclass problems, and that it demands at most 2 min(a, b) −1 iterations when fitting the two-way additive ANOVA model of size a by b. This can, for example, lead to large savings for models with many more rows than columns. For empirical comparisons a two-way ANOVA model is fitted to some artificial and nonartificial data. For the problems considered, the proportional subclass algorithm requires the fewest iterations followed by the Golub and Nash, optimized steepest descent, Hemmerle, and Yates algorithms, in that order. Some of the differences are quite substantial, involving factors of 10 or more.
Read full abstract