Abstract A goodness-of-fit statistic D 2 is introduced for use in multinomial distributions. Pearson's X2 and D 2 are both approximately normally distributed when the sample size N is not large relative to the number of multinomial categories k. Under sequences of local alternative hypotheses the test based on D 2 exhibits moderate power when the X2 test is biased. Application is made to the analysis of large sparse contingency tables. A theorem is presented that describes the likelihood ratio Λ for testing a simple multinomial distribution against a mixture of multinomial distributions. A wide variety of mixing distributions is considered, and the D 2 statistic is a special case of log Λ when testing for Dirichlet mixtures of multinomial distributions. In the case where N is large, relative to k, X2 and D 2 + k behave approximately as chi-squared random variables and differ by a very small amount under the null hypothesis. In this situation X2 and D 2 + k will generally yield the same inference. With a large, sparse multinomial distribution where N and k are both large, X2 and D 2 will usually behave as normal random variables with means and variances that are unrelated to the chi-squared distribution. In the sparse distribution, X2 and D 2 are not equivalent and X2 will accept the null hypothesis too often under certain alternative hypotheses. When testing for independence of rows and columns in a large, sparse two-dimensional contingency table, estimate the cell means conditional on the marginal totals in the usual manner. The [Dcirc]2 statistic resulting from substituting these estimated means for the cell means is highly correlated with D 2. The normalizing mean and variance of [Dcirc]2 under the null hypothesis of independence are given at the end of the article.
Read full abstract