Abstract

Genomic methods have made statistical multiple-test methods important to geneticists and molecular biologists. These tests apply to identification of quantitative trait loci and measurement of changes in RNA or DNA abundance by microarray methods. Recently developed multiple-test methods provide more statistical power when many of the tested null hypotheses are false. At the same time, these methods can provide stringent control of errors in cases when most or all of the tested null hypotheses are true. These methods control errors in a different way from previous hypothesis tests, controlling or estimating quantities called the posterior error rate (PER), false discovery rate (FDR), or proportion of false positives (PFP), rather than the type I error. In this study, we attempt to clarify the relationships among these methods and demonstrate how the proportion of true null hypotheses among all tested hypotheses plays an important role. Genomic methods, those that evaluate many genes or many genomic locations for some property, often require testing a large set of statistical hypotheses, called a family of hypotheses. Such a family may include thousands of hypotheses. For example, detection of quantitative trait loci involves testing a statistical association between trait values and genotypes at several hundred marker loci (Lander and Botstein 1989). Microarray analysis of RNA expression may involve looking for changes among thousands of RNA species (Lockhart et al. 1996). Combining the two techniques (Jansen and Nap 2001; Brem et al. 2002; Schadt et al. 2003), tests pairwise associations between thousands of RNA expression patterns and genotypes at hundreds of marker loci. Naive application of standard hypothesis tests with no adjustment for multiple testing will yield large numbers of nonreproducible positive results or false discoveries (Soric 1989). On the other hand, using multiple testing methods to control the familywise type I error rate (FWER, see below) can greatly reduce the power to detect discoveries in families of tests where many such cases should be detected. In this study, we examine the relationship between the traditional type I error and the other criteria that seem more useful for genetics hypothesis testing. The issue is essentially that faced by Morton (1955) when he proposed that an LOD score of 3.0 be required to declare linkage of genetic loci in humans. Formal statistical hypothesis tests provide a standard method for interpreting experimental data. They contrast a null hypothesis, H0, and an alternative hypothesis, H1. The null hypothesis H0 is chosen so that the probability of any experimental outcome can be calculated assuming H0 to be true. If under H0 the probability of observing results as extreme or more extreme than the observed results from an experiment is less than some desired value , H0 is rejected and H1 is accepted. The value is the desired type I error rate, or the probability of rejecting H0 when H0 is true. This value may also be called the comparisonwise type I error rate (CWER) when it refers to the rate for a single test in a family of tests. Suppose now that there is a family of m tests and that the null hypothesis is true for m0 of the tests and false for m1 = m –m0 of them. Table 1 summarizes the possible outcomes for this family of m tests. Each test yields a value x for a statistic X and a p-value, which is P(X x|H0), the probability that X would match or exceed the observed value x under the assumption that the null hypothesis is true. For m0 of the tests, those for which the null hypothesis is true, the pvalues are uniformly distributed. For the other m1 tests, the distribution of p-values will be stochastically smaller than a uniform distribution (i.e., P(p x|H1) x = P(p x|H0) for any x between 0 and 1). The ratios 0 = m0/m and 1 = m1/m, when m is large, can be interpreted as approximate Bayesian prior probabilities of the null and alternative hypotheses, respectively. The terms “discovery” (Soric 1989) or “positive result” are often used to refer to a hypothesis test in which the null hypothesis is rejected. The terms “false discovery” or “false positive” are commonly used to describe the case in which the null hypothesis is rejected, although it is, in fact, true. The power of a test is the probability that a positive result will be obtained when the null hypothesis is false. Tests with high power will tend to produce high values of S in Table 1, whereas tests with low power may produce high values of T in Table 1. The familywise error rate (FWER), also known as the overall type I error rate, is the probability of one or more type I errors in a family of tests. In terms of the definitions in Table 1, the FWER is simply P(V > 0). Much of the past research in the area of multiple testing has focused on the development of methods that control this probability. Strong control of the FWER at level is achieved if the FWER is less than or equal to , regardless of the number of false null hypotheses (m1). Weak control is obtained if the FWER is less than or equal to whenever all tested null hypotheses are true (m0 = m) and not necessarily less than or equal to when some of the null hypotheses are false (m1 > 0). Controlling the FWER is important for tests in which the family is being tested as a unit, and the rejection of any null hypothesis affects the whole family. More generally, control of the FWER is important whenever it is necessary for an analysis to produce no false positives with high probability. When a family contains many tests, producing no false positives with high probability may require a substantial degree of conservativeness that could lead to many type II errors (i.e., large values of T in Table 1). Suppose, for example, that a method used to test each of the m genes for differential expression in a microarray experiment correctly rejects 99 false null hypotheses and incorrectly rejects one true null hypothesis (S = 99, V = 1, R = 100). From the standpoint of FWER error control, the performance of the method on this one data set would be considered in error because of the one false positive result. Such an error would be allowed to occur in only 5% of the experiments to which the method would be Corresponding author. E-MAIL Kmanly@Tennessee.edu; FAX (901) 4487193. Article and publication are at http://www.genome. org/cgi/doi/10.1101/gr.2156804. Insight/Outlook

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call