Abstract

Abstract When applied to frequency tables with small expected cell counts, Pearson chi-squared test statistics may be asymptotically inconsistent even in cases in which a satisfactory chi-squared approximation exists for the distribution under the null hypothesis. This problem is particularly important in cases in which the number of cells is large and the expected cell counts are quite variable. To illustrate this bias of the chi-squared test, this article considers the Pearson chi-squared test of the hypothesis that the cell probabilities for a multinomial frequency table have specified values. In this case, the expected value and variance of the Pearson chi-square may be evaluated under both the null and alternative hypotheses. When the number of cells is large, normal approximations and discrete Edgeworth expansions may also be used to assess the size and power of the Pearson chi-squared test. These analyses show that unless all cell probabilities are equal, it is possible to select a significance level and cell probabilities under the alternative hypothesis such that the power is less than the size of the test. As shown by exact calculations, the difference may be substantial even in cases in which all expected cell sizes are at least 5 under the null hypothesis. The use of moments shows that given any minimum expected cell size under the null hypothesis and given any significance level, it is possible to make the power arbitrarily close to 0 by the selection of a large enough number of cells in the table and suitable cell probabilities for the null and alternative hypotheses. The normal approximations for the distribution of the Pearson chi-squared statistic permit the size of this bias to be assessed in less-extreme cases involving tables with many cells. These results imply that caution must be exercised in the application of Pearson chi-squared statistics to sparse contingency tables with many cells. An alternative to the Pearson chi-square, proposed by Zelterman (1986), avoids some of the problems. Exact calculation, however, shows that the alternative statistic does not eliminate all problems of bias. The problems described in this article clearly extend to more general applications of the Pearson chi-squared statistic.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call