Abstract Many goodness-of-fit problems can be put in the following form: n = (n 1, …, nk ) is a multinomial (N, π) vector, and it is desired to test the composite null hypothesis H 0: π = p(θ) against all possible alternatives. The usual tests used are Pearson's (1900) statistic or the likelihood ratio statistic (Neyman and Pearson 1928) . Cressie and Read (1984) pointed out that both of these statistics are in the power family of statistics , with λ = 1 and 0, respectively; they suggested an alternative statistic with . Although all of these statistics are asymptotically χ2 in the usual situation of K fixed and N → ∞, this is not the case if the multinomial is sparse; specifically, Morris (1975) showed that, under certain regularity conditions with K and N → ∞, both X 2 and G 2 are asymptotically normal (with different mean and variance) under a simple null hypothesis. Cressie and Read (1984) extended these results to the general 2NI λ family. Although these results have not been proven for composite nulls, it is certainly reasonable to expect that they continue to hold. Clearly, testing would require an estimate of the variance of the statistic that is valid under composite hypotheses in the sparse situation. In this article the use of nonparametric techniques to estimate these variances is examined. Simulations indicate (and heuristic arguments support) that although the bootstrap (Efron 1979) does not lead to a consistent variance estimate, the parametric bootstrap, the jackknife (Miller 1974) and a “categorical jackknife” (in which cells are deleted rather than observations) each leads to a consistent estimate. Simulations indicate that the jackknife is the nonparametric estimator of choice, and it is superior to the usual asymptotic formula for sparse data. Although these comparisons are based on the unconditional variance of the statistics, it is shown that the unconditional variance and the variance conditional on fitted parameter estimates are asymptotically equal if the underlying probability vector is from the general exponential family. Simulations also indicate that the jackknife estimate of variance is the estimator of choice in general parametric models for multinomial data.
Read full abstract