Abstract

In modern data science, higher criticism (HC) method is effective for detecting rare and weak signals. The computation, however, has long been an issue when the number of p-values combined (K) and/or the number of repeated HC tests (N) are large. Some computing methods have been developed, but they all have significant shortcomings, especially when a stringent significance level is required. In this article, we propose an accurate and highly efficient computing strategy for four variations of HC. Specifically, we propose an unbiased cross-entropy-based importance sampling method ( IS C E ) to benchmark all existing computing methods, and develop a modified SetTest method (MST) that resolves numerical issues of the existing SetTest approach. We further develop an ultra-fast approach (UFI) combining pre-calculated statistical tables and cubic spline interpolation. Finally, following extensive simulations, we provide a computing strategy integrating MST, UFI, and other existing methods with R package “HCp” for virtually any K and small p-values ( ∼ 10 − 20 ). The method is applied to a COVID-19 disease surveillance example for spatio-temporal outbreak detection from case numbers of 804 days in 3342 counties in the United States. Results confirm viability of the computing strategy for large-scale inferences. Supplementary materials for this article are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.