Statistical dependence between hypotheses poses a significant challenge to the stability of large scale multiple hypotheses testing. Ignoring it often results in an unacceptably large spread in the false positive proportion even though the average value is acceptable (Fan et al., J Amer Statist Assoc 107(499): 1019-1035, 2012; Owen J R Stat Soc Ser B 67(3): 411–426, 2005; Qiu et al., Stat Appl Genet Mol Biol 4: 32, 2005 and Schwartzman and Lin Biometrika 98(1): 199–214, 2011). However, the statistical dependence structure of data is often unknown. Using a generic signal-processing model, Bayesian multiple testing, and simulations, we demonstrate that the variance of the false positive proportion can be substantially reduced even under unknown short range dependence. We do this by modeling the data generating process as a stationary ergodic binary signal process embedded in noisy observations. We derive conditional probabilities needed for the Bayesian multiple testing by incorporating nearby observations into a second order Taylor series approximation. Simulations under general conditions are carried out to assess the validity and the variance reduction of the approach. Along the way, we address the problem of sampling a random Markov matrix with specified stationary distribution and lower bounds on the top absolute eigenvalues, which is of interest in its own right.
Read full abstract