Large-scale multiple testing, which calls for conducting tens of thousands of hypothesis testings simultaneously, has been applied in many scientific fields. Most conventional multiple testing procedures often focused on the control of false discovery rate (FDR) and largely ignored covariate information and the dependence structure among tests. A FDR control procedure, termed as Covariate-Modulated Local Index of Significance (cmLIS) procedure, which not only takes into account local correlations among tests but also accommodates the covariate information by leveraging a covariate-modulated hidden Markov model (HMM), has been proposed. In the oracle case where all parameters of the covariate-modulated HMM are known, the cmLIS procedure is shown to be valid and optimal in some sense. According to whether the number of mixed components in the non-null distribution is known, two Bayesian sampling algorithms are provided for parameter estimation. Extensive simulations are conducted to demonstrate the effectiveness of the cmLIS procedure over state-of-the-art multiple testing procedures. Finally, the cmLIS procedure is applied to an RNA sequencing data and a schizophrenia (SCZ) data.
Read full abstract