Abstract
MANY OF the current tests and reference books used by educational researchers are quite sharp in their condemnation of the practice of performing t-tests on each pair of means as a method of testing the hypothesis that, in a completely randomized de sign, jUj. = ju2 = ... = Lij. whenk>2. The prac tice generally recommended as a suitable alterna tive, however, has but little more merit. The authors of these works quite rightly indi cate that when a given critical value is set for test ing all the possible pairs of differences between a number of means in this way, the probability of at least one false significant difference will be great er than the critical value. The practice sometimes suggested and followed today involves applying the F-test for analysis of variance and following that with t-tests for all possible comparisons between means. Such a procedure ignores one of the prin cipal arguments against applying the t-test in the first place. As Steel and Torrie (19) point out, with three treatments the observed value of t for the greatest difference will exceed the tabled .05 level about 13% of the time even if the treatment popula tions truly do not have differing means. This change in significance level is progressive, and for six treatments the value of t for the greatest dif ference will be greater than the tables . 05 level about 40% of the time. It is true that some protection against the er ror of rejecting a true hypothesis of no difference is provided if specific comparisons are made only after the application of the F-test and only if the result indicates rejection of the hypothesis that ?i = M2 = MkIt would seem, however, that at this point it would be appropriate to turn to one of a number of procedures for specific comparisons among treatment means which have been developed in recent years. These procedures, in general, are designed to set an experiment-wise error rate or level of significance. A number of the more apparently useful meth ods of comparison have received some attention in recent years (5, 9,10,11,15,19, 22). In an article in the September 1955 issue of Psychological Bulletin, McHugh (9) discussed the problem of examining comparisons between means which are suggested by the data. A method pro posed by Scheff? (13) was applied to two examples of this type of situation. Additional interest in such a posteriori or post-mortem analyses was stirred by a paper prepared by Stanley (15) and also pub lished in the Psychological Bulletin. Stanley point ed out that Scheff?'s test, while offering an experi ment-wise level of significance, was not as power ful as some other tests proposed. A procedure de veloped by Tukey (20) was offered as an alternative. For the instance in which comparisons of each ex perimental group with a control only are desired, Stanley offered the procedure suggested by Dun nett (4). More recently, Thomas A. Ryan (11) discussed the questions of logic which arise in choosing methods of dealing with multiple comparisons and other multiple statistical tests. He contended that the decision with respect to which methods of com parison are correct should not revolve around whether or not the comparisons being made were specified in advance of data collection (a priori) or were suggested by the data (a posteriori). Rather, wrote Ryan, the fact is that the classical methods are inappropriate for multiple comparisons under any circumstances, and should be replaced by the newer methods in either instance. It should be noted that Ryan also presented a very comprehen sive discussion of the considerations involved in determining the actual error rate or level of sig nificance in a given experiment involving multiple comparisons. A resulting exchange by Gaito (6) and Ryan (12) resulted in further clarification of the problem and proposed solutions. Nevertheless, the question of whether the appropriate error rate should be per comparison or per experiment con tinues to be debated (13, 21), with the weight of evidence seeming to fall on the side of those who espouse the use of an experiment-wise error rate. One point raised by Ryan (12) with respect to the newer methods for multiple comparisons should be made quite clear. The Scheff?, Tukey, Dunnett and certain other of the available methods may be applied without the use of an initial F-test of the
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have