Abstract

There are multiple tests of homogeneity of binomial proportions in the statistics literature. However, when working with sparse data, most test procedures may fail to perform well. In this article we review nine classical and recent testing procedures, including the standard Pearson and likelihood ratio tests; exact conditional and unconditional tests; tests based on moment matching chi-squared approximations; a recently proposed test based on a normal approximation in an asymptotic framework for sparse data; and a recent test based on higher order moment corrections using an Edgeworth approximation. For each test we review its theoretical underpinning, and show how to calculate the P-value. Most of the P-values can be readily calculated in a statistical computing software package such as R. We compare type I error probability and power via simulation. As expected, none of the procedures uniformly outperforms the others in terms of type I error probability and power, but we can make some recommendations based on our empirical results. In particular, we indicate scenarios in which certain otherwise reasonable test procedures can perform inadequately.

Highlights

  • We consider the problem of testing the homogeneity hypothesis for k binomial populations of possibly unequal sample sizes based on observing one data point on each of the k populations

  • Using TP as the test statistic, an exact unconditional test can be obtained by defining the P-value as supπ∈[0,1] Pr{Tp tp} where the probability is computed with respect to the distribution (10) and depends on π, i.e., sup Pr{Tp tp} = sup pu(x1, . . . , xk | π)

  • We find that the PW and PWBB tests have probability of type I error below the nominal level α = 0.05

Read more

Summary

Introduction

We consider the problem of testing the homogeneity hypothesis for k binomial populations of possibly unequal sample sizes based on observing one data point on each of the k populations. Generation of synthetic count data in cross-classified contingency tables can be based on an ANOVA type log-linear model for cell probabilities along with a multinomial assumption for the joint distribution of the cell counts (Klein and Creecy, 2010). While a fully saturated log-linear model provides little flexibility, under the independence model, interaction terms are set to zero. This precisely corresponds to the homogeneity of associated cell probabilities across rows or columns, and suggests testing of homogeneity of proportions across rows or columns before using the independence model. The primary focus of this article is to provide a comprehensive comparison by simulation among the available tests in terms of type I error probability and power in sparse data settings.

Test Procedures
Standard chi-squared and likelihood ratio tests
Exact tests
Empirical Comparison of the Tests
ExactC
Concluding Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.