Abstract

BackgroundWe consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) control procedures. Recent simulation studies consider only simple correlation structures among variables, which is hardly inspired by real data features. Our aim is to systematically study effects of several network features like sparsity and correlation strength by imposing dependence structures among variables using random correlation matrices.ResultsWe study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable.ConclusionWe discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on π0 or FDR estimation in a dependency context.

Highlights

  • We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) control procedures

  • We show how the dependence parameter can serve as a measure of FDR behavior under correlation-based dependence conditions

  • We show that our parameter M, the number of rows of the initial Z matrix, may be used to control it, which is suggested by the asymptotic relation, as given in equation (4)

Read more

Summary

Introduction

We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) control procedures. One performs hypothesis testing for each gene and the number of genes is usually more than thousands. In this situation, direct application of single hypothesis testing thousands times produces a large number of false discoveries. It is widely recognized that dependencies are omnipresent in many high-throughput studies. Such dependencies may be regulatory or functional as in gene pathways, and spatial such as in SNP or DNA copy number arrays because of the genomic order. Attempts to infer (page number not for citation purposes)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call