Abstract
For detecting weak and sparse signals by a set of n input p-values, the Higher Criticism (HC) type statistics, the BerkJones (B-J) type statistics, and the phi-divergence statistics have the equivalent asymptotic optimality as n goes to infinity. However, they can have significantly different performance in practical data analysis, where n is always finite and even very small. To address this problem in a broader context, this paper introduces a general family of goodness-of-fit statistics, called the gGOF, which unifies a broad signal-detection statistics including these optimal ones. Efficient and accurate analytical calculations for the distributions of the gGOF statistics are provided under arbitrary i.i.d. continuous models of the null and the alternative hypotheses. Based on that, a systematic power study reveals that in finite case, the number of signals is often more relevant than the signal proportion. The HC and the reverse HC have advantages for relatively sparser and denser signals, respectively, while the B-J is more robust. A general framework is given to apply the gGOF into data analysis based on the generalized linear models. An application to the SNP-set based genome-wide association study (GWAS) for Crohn's disease shows that these optimal statistics have a good potential for detecting novel disease genes with weak SNP effects. The calculations have been implemented into an R package SetTest and published on the CRAN.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.