Abstract

We present the -statistic permutation (USP) test of independence in the context of discrete data displayed in a contingency table. Either Pearson’s -test of independence, or the -test, are typically used for this task, but we argue that these tests have serious deficiencies, both in terms of their inability to control the size of the test, and their power properties. By contrast, the USP test is guaranteed to control the size of the test at the nominal level for all sample sizes, has no issues with small (or zero) cell counts, and is able to detect distributions that violate independence in only a minimal way. The test statistic is derived from a -statistic estimator of a natural population measure of dependence, and we prove that this is the unique minimum variance unbiased estimator of this population quantity. The practical utility of the USP test is demonstrated on both simulated data, where its power can be dramatically greater than those of Pearson’s test, the -test and Fisher’s exact test, and on real data. The USP test is implemented in the R package USP.

Highlights

  • Pearson’s χ 2-test of independence [1] is one of the most commonly used of all statistical procedures

  • We show in appendix Aa that even in the simplest setting of a 2 × 2 table, and no matter how large the sample size n, it is possible to construct a joint distribution that satisfies the null hypothesis of independence, but for which the probability of Type I error is far from the desired level! Practitioners are aware of this deficiency of Pearson’s test and the G-test (e.g. [7, p. 40]), but our example provides an explicit demonstration

  • We show that the U-statistic permutation (USP) test statistic is derived from the unique minimum variance unbiased estimator of a natural measure of dependence in a contingency table. To complement these theoretical results, we present several numerical comparisons between the USP test and both Pearson’s test and the G-test, as well as another alternative, namely

Read more

Summary

Introduction

Pearson’s χ 2-test of independence [1] is one of the most commonly used of all statistical procedures. It is typically employed in situations where we have discrete data consisting of independent copies of a pair (X, Y), with X taking the value xi with probability qi, for i = 1, . I, and Y taking the value yj with probability rj, for j = 1, . X might represent marital status, taking values ‘Never married’, ‘Married’, ‘Divorced’, ‘Widowed’ and Y might represent level of education, 2021 The Authors.

High school
Type I error
Findings
Now let
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call