Abstract

Modern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate (FDR) is one of the most commonly used approaches for measuring and controlling error rates when performing multiple tests. Adaptive FDRs rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here, we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini–Hochberg adjusted p-values, leading to a plug-in FDR estimator. We apply our method to a genome-wise association meta-analysis for body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios. We provide an implementation of this novel method for estimating the proportion of null hypotheses in a regression framework as part of the Bioconductor package swfdr.

Highlights

  • Multiple testing is a ubiquitous issue in modern scientific studies

  • We only report the results for the Scott T and Scott E approaches for the cases of the z-statistics and t-statistics, where these are inputted directly in the methods implemented in the FDRreg package

  • We have introduced an approach to estimating false discovery rate (FDR) conditional on covariates in a multiple testing framework, by first estimating the proportion of true null hypotheses via a regression model—a method implemented in the swfdr package—and using this in a plug-in estimator

Read more

Summary

INTRODUCTION

Multiple testing is a ubiquitous issue in modern scientific studies. Microarrays (Schena et al, 1995), next-generation sequencing (Shendure & Ji, 2008), and high-throughput metabolomics (Lindon, Nicholson & Holmes, 2011) make it possible to simultaneously test the relationship between hundreds or thousands of biomarkers and an exposure or outcome of interest. There are a variety of situations where meta-data could be valuable for improving the decision of whether a hypothesis should be rejected in a multiple testing framework, our focus being on an example from the meta-analysis of data from a GWAS for BMI (Locke et al, 2015) Using standard approaches such as that of Storey (2002), we can estimate the fraction of single nucleotide polymorphisms (SNPs)—genomic positions (loci) which show between-individual variability—which are not truly associated with BMI and use it in an adaptive FDR procedure. Our results are consistent with intuition—larger sample sizes and larger MAFs lead to a smaller fraction of SNPs estimated to be null They do, allow for improved quantification of this relationship: For example, we see that the range for p^0ðxiÞ is relatively wide ((0.697, 1) for the final smoothed estimate), while the smoothed estimate of p0 without covariates—obtained via the Storey (2002). Scenario II is a smooth function of one variable similar to Fig. 2 (MAF in [0.302, 0.500]), scenario III is a π0( x1)

C Scenario III
E Scenario V
Notes:
Findings
DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.