The positive false discovery rate: a Bayesian interpretation and the q-value

John D Storey

doi:10.1214/aos/1074290335

Abstract

Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all significant hypotheses. The FDR is especially appropriate for exploratory analyses in which one is interested in finding several significant results among many tests. In this work, we introduce a modified version of the FDR called the “positive false discovery rate” (pFDR). We discuss the advantages and disadvantages of the pFDR and investigate its statistical properties. When assuming the test statistics follow a mixture distribution, we show that the pFDR can be written as a Bayesian posterior probability and can be connected to classification theory. These properties remain asymptotically true under fairly general conditions, even under certain forms of dependence. Also, a new quantity called the “q-value” is introduced and investigated, which is a natural “Bayesian posterior p-value,” or rather the pFDR analogue of the p-value. 1. Introduction. When testing a single hypothesis, one is usually concerned with controlling the false positive rate while maximizing the probability of detecting an effect when one really exists. In statistical terms, we maximize the power conditional on the Type I error rate being at or below some level. The field of multiple hypothesis testing tries to extend this basic paradigm to the situation where several hypotheses are tested simultaneously. One must define an appropriate compound error measure according to the rate of false positives one is willing to encounter. Then a procedure is developed that allows one to control the error rate at a desired level, while maintaining the power of each test as much as possible. The most commonly controlled quantity when testing multiple hypotheses is the family wise error rate (FWER), which is the probability of yielding one or more false positives out of all hypotheses tested. The most familiar example of this is the Bonferroni method. If there are m hypothesis tests, each test is controlled so that the probability of a false positive is less than or equal to α/m for some chosen value of α. It then follows that the overall FWER is less than or equal to α .M any

Full Text