Abstract

Krueger’s (January 2001) thesis can be summarized as follows: Induction is central to science (true). From a philosophical perspective, induction cannot be defended logically but can be defended pragmatically— because it leads to progress in science (true). Null hypothesis significance testing (NHST) is a good tool—perhaps the essential tool— for induction and inference from research data (false). NHST is like induction: It cannot be defended logically (true) but can be defended pragmatically (false). The pragmatic defense Krueger offered is the contention that NHST has “proven useful” (p. 24) and that it “rewards the pragmatic scientist” (p. 23; both false). Induction as used by Krueger (2001) is the same as hypothesis testing; it is undisputed that hypothesis testing is indispensable for science. Krueger’s position, then, reduces to the proposition that NHST is the best procedure—and perhaps the essential procedure— for testing hypotheses. This false argument has long been offered as a defense of significance testing (Schmidt & Hunter, 1997, pp. 42–44). In its strong form, the argument is that without significance testing, psychologists could not have a science because they would no longer be able to test hypotheses. The physical sciences, such as physics and chemistry, do not use NHST or statistical significance tests of any kind, yet these sciences test hypotheses and have done so for centuries. In fact, in contrast to many psychologists, most researchers in the physical sciences regard reliance on significance testing as unscientific (Schmidt, 1996). If the argument is that hypothesis testing requires the use of significance tests, then the logical implications are that physicists and chemists are not really testing hypotheses and that their research is not really scientific. How plausible is this? If the argument is that significance testing is the best method of testing hypotheses in science, then the logical implication is that the hypothesis-testing methods used in physics and chemistry are suboptimal and inferior to those based on NHST and typically used in psychology. Does anyone really believe this? Hence, although induction (hypothesis testing) is central to science, neither NHST nor any other form of statistical significance testing is required for hypothesis testing. Significance testing almost invariably retards the search for knowledge by producing false conclusions about research literature. The evidence is strong that the null hypothesis is almost always false in psychological research. For example, Lipsey and Wilson (1993) examined 302 meta-analyses of psychological interventions of all kinds in many areas of psychology. In only 2 of these meta-analyses were the effect sizes (ESs) zero or near zero (less than 1%). An examination of all published meta-analyses would produce a similar figure for psychology as a whole. If the null hypothesis is typically false, then Type I error is not important because it is impossible to make a Type I error when the null is false. What is important is Type II error: failing to detect the effect or relation that is there. One minus the Type II error rate is the statistical power of the study: the probability of detecting the effect or relation. The evidence is clear that the average level of statistical power in psychological research is between .40 and .60 (e.g., see Cohen, 1962, 1994; Sedlmeier & Gigerenzer, 1989). The operational decision rule used by researchers is “if it is significant, it is real; if it is not significant, it is zero” (Schmidt, 1996). Hence, the error rate in the typical psychological research literature is approximately 50%—that is, half of all studies reach false conclusions about the null hypothesis, a situation of maximal apparent conflict in the literature. As discussed by Schmidt (1996), this leads to one of two false conclusions about the meaning of the literature. The first is that the literature is so conflicting that nothing can be concluded. The second is that there are interactions or moderator variables that cause the effect to exist in some studies and to be nonexistent in others and that research should be directed at finding these moderator variables. Meta-analysis typically indicates that both of these conclusions are false—by revealing that the effect exists in all studies (Schmidt, 1996). Significance tests are a disastrous method for testing hypotheses, but a better method does exist: use of point estimates (ESs) and confidence intervals (CIs). First, unlike significance tests, CIs hold the real error rate to .05 (or whatever confidence level is set); there is no possibility of a higher error rate as with significance tests. In particular, the true error rate will never be 50% when the researcher thinks it is 5% (because the alpha level is set at .05). Second, almost all of the CIs from different studies overlap each other, correctly suggesting that the studies are not contradictory. Third, the CI clearly reveals the level of uncertainty in the study results; unlike the significance test, the CI provides an index of the effects of sampling error on the results. Finally, the ES provides the information needed for subsequent meta-analyses, whereas the significance test does not. Krueger (2001) stated, “In daily research activities, NHST has proven useful. Researchers make decisions concerning the validity of hypotheses, and although their decisions sometimes disagree, they are not random or arbitrary” (p. 24). First, Krueger presented no evidence to support his assertion of NHST’s usefulness. As shown above, significance testing creates confusion and false conclusions about research literature. How is this useful? Second, researchers’ conclusions disagree more often than sometimes—in the typical research literature, they disagree 50% of

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call