Improved inference for the signal significance

I Volobouev,A Trindade

doi:10.1088/1748-0221/13/12/p12011

Abstract

We study the properties of several likelihood-based statistics commonly used in testing for the presence of a known signal under a mixture model with known background, but unknown signal fraction. Under the null hypothesis of no signal, all statistics follow a standard normal distribution in large samples, but substantial deviations can occur at practically relevant sample sizes. Approximations for respective p-values are derived to various orders of accuracy using the methodology of Edgeworth expansions. Adherence to normality is studied, and the magnitude of deviations is quantified according to resulting p-value inflation or deflation. We find that approximations to third-order accuracy are generally sufficient to guarantee p-values with nominal false positive error rates in the 5σ range (p-value =2.87 × 10−7) for the classic Wald, score, and likelihood ratio (LR) statistics at relatively low sample sizes. Not only does LR have better adherence to normality, but it also consistently outperforms all other statistics in terms of false negative error rates. The reasons for this are shown to be connected with high-order cumulant behavior gleaned from fourth order Edgeworth expansions. Finally, a conservative procedure is suggested for making finite sample adjustments while accounting for the look elsewhere effect via the theory of random fields.

Full Text