Abstract

This paper takes an information-geometric approach to the challenging issue of goodness-of-fit testing in the high dimensional, low sample size context where—potentially—boundary effects dominate. The main contributions of this paper are threefold: first, we present and prove two new theorems on the behaviour of commonly used test statistics in this context; second, we investigate—in the novel environment of the extended multinomial model—the links between information geometry-based divergences and standard goodness-of-fit statistics, allowing us to formalise relationships which have been missing in the literature; finally, we use simulation studies to validate and illustrate our theoretical results and to explore currently open research questions about the way that discretisation effects can dominate sampling distributions near the boundary. Novelly accommodating these discretisation effects contrasts sharply with the essentially continuous approach of skewness and other corrections flowing from standard higher-order asymptotic analysis.

Highlights

  • We start by emphasising the threefold achievements of this paper, spelled out in detail in terms of the paper’s section structure below

  • Working again explicitly in the extended multinomial context, we fill a hole in the literature by linking information-geometric-based divergences and standard goodness-of-fit statistics

  • One of the first major impacts that information geometry had on statistical practice was through the geometric analysis of higher order asymptotic theory (e.g., [8,9])

Read more

Summary

Introduction

We start by emphasising the threefold achievements of this paper, spelled out in detail in terms of the paper’s section structure below. These results explore the sampling performance of standard goodness-of-fit statistics—Wald, Pearson’s χ2 , score and deviance—in the sparse setting. They look at the case where the data generation process is “close to the boundary” of the parameter space where one or more cell probabilities vanish. It looks at the power family of Cressie and Read [4,5] in terms of the geometric theory of divergences.

Sampling Distributions in the Sparse Case
Divergences and Goodness-of-Fit
The Power-Divergence Family
Literature Review
Links with Information Geometry
Extended Multinomial Case
Simulation Studies
Transition Between Discrete and Continuous Features of Sampling Distributions
Findings
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.