Abstract

SummaryThe false discovery rate (FDR) measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the FDR. We develop a new framework for formulating and estimating FDRs and q-values when an additional piece of information, which we call an “informative variable”, is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The FDR is then treated as a function of this informative variable. We consider two applications in genomics. Our first application is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.

Highlights

  • Multiple testing is routinely conducted in many scientific areas

  • We first introduce our model, provide formulas for the positive false discovery rate, positive false nondiscovery rate and q-value, and describe the significance rule based on the q-value

  • To investigate how the performance of the functional FDR (fFDR) method is affected by the informative variable Z through the conditional p-value density f1(p|z) under the false null hypothesis, we construct two types of f1 (p|z): i) dependent: f1 (p|z) ∝ pα0−1 (1 − p)β0−1, where α0 = 0.3, γ0 = 4.5, β0 = γ0 +1.4z, where the scalar 1.4 helps reduce the chance of generating large p-values under the false null hypothesis. ii) independent: f1 (p|z) ∝ pα0−1 (1 − p)γ0−1 with α0 = 0.3, γ0 = 4.5, where f1 (p|z) is independent from z

Read more

Summary

Introduction

Multiple testing is routinely conducted in many scientific areas. For example, in genomics, RNAseq technology is often utilized to test thousands of genes for differential expression among two or more biological conditions. Additional information on the status of a null hypothesis or the power of a test may be available to help better estimate the FDR and q-value. In the RNA-seq study, the p-values are subdivided into six different strata of per-gene read depth It can be seen in both cases that the proportion of true null hypotheses and the power to identify significant tests vary in a systematic manner across the strata. We code additional information into a quantitative informative variable and extend the Bayesian framework for FDR pioneered in Storey (2003) to incorporate this informative variable This leads to a functional proportion of true null hypotheses (or “functional null proportion" for short) and a functional power function.

The functional FDR framework
Optimal statistic
Q-value based decision rule
Implementation of the fFDR methodology
Estimating the functional null proportion
FDR and q-value estimation
Simulation study
Simulation design
Simulation results
Applications in genomics
Background on the eQTL experiment
Background on the RNA-seq study
Estimating the functional null proportion in the two studies
Application of fFDR method in the two studies
Discussion
Findings
A Choice of tuning parameter for estimators of the functional null proportion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call