Optimal Discovery Procedure Research Articles

SummaryThe Neyman–Pearson lemma provides a simple procedure for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single-testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. I show here that one can improve the overall performance of multiple significance tests by borrowing information across all the tests when assessing the relative significance of each one, rather than calculating p-values for each test individually. The ‘optimal discovery procedure’ is introduced, which shows how to maximize the number of expected true positive results for each fixed number of expected false positive results. The optimality that is achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The optimal discovery procedure motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several normal means is defined; this is surprisingly demonstrated to outperform the optimal single-test procedure, showing that a method which is optimal for single tests may no longer be optimal for multiple tests. Connections to other concepts in statistics are discussed, including Stein's paradox, shrinkage estimation and the Bayesian approach to hypothesis testing.

Read full abstract

BackgroundVarious statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels.ResultsWe assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems.ConclusionThe experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests.

Read full abstract

Optimal Discovery Procedure Research Articles

Related Topics

Articles published on Optimal Discovery Procedure

Optimal Sufficient Statistics for Parametric and Non-Parametric Multiple Simultaneous Hypothesis Testing

Bayesian optimal discovery procedure for simultaneous significance testing

The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing

Detecting differential expression in microarray data: comparison of optimal procedures.

Semi-supervised discovery of differential genes

The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Optimal Discovery Procedure Research Articles

Related Topics

Articles published on Optimal Discovery Procedure

Optimal Sufficient Statistics for Parametric and Non-Parametric Multiple Simultaneous Hypothesis Testing

Bayesian optimal discovery procedure for simultaneous significance testing

The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing

Detecting differential expression in microarray data: comparison of optimal procedures.

Semi-supervised discovery of differential genes

The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments