On the Choice and Number of Microarrays for Transcriptional Regulatory Network Inference

Elissa J Cosgrove,Eric D Kolaczyk,Timothy S Gardner

doi:10.1186/1471-2105-11-454

Elissa J Cosgrove, Eric D Kolaczyk + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-11-454

Copy DOI

Abstract

BackgroundTranscriptional regulatory network inference (TRNI) from large compendia of DNA microarrays has become a fundamental approach for discovering transcription factor (TF)-gene interactions at the genome-wide level. In correlation-based TRNI, network edges can in principle be evaluated using standard statistical tests. However, while such tests nominally assume independent microarray experiments, we expect dependency between the experiments in microarray compendia, due to both project-specific factors (e.g., microarray preparation, environmental effects) in the multi-project compendium setting and effective dependency induced by gene-gene correlations. Herein, we characterize the nature of dependency in an Escherichia coli microarray compendium and explore its consequences on the problem of determining which and how many arrays to use in correlation-based TRNI.ResultsWe present evidence of substantial effective dependency among microarrays in this compendium, and characterize that dependency with respect to experimental condition factors. We then introduce a measure neff of the effective number of experiments in a compendium, and find that corresponding to the dependency observed in this particular compendium there is a huge reduction in effective sample size i.e., neff = 14.7 versus n = 376. Furthermore, we found that the neff of select subsets of experiments actually exceeded neff of the full compendium, suggesting that the adage 'less is more' applies here. Consistent with this latter result, we observed improved performance in TRNI using subsets of the data compared to results using the full compendium. We identified experimental condition factors that trend with changes in TRNI performance and neff , including growth phase and media type. Finally, using the set of known E. coli genetic regulatory interactions from RegulonDB, we demonstrated that false discovery rates (FDR) derived from neff -adjusted p-values were well-matched to FDR based on the RegulonDB truth set.ConclusionsThese results support utilization of neff as a potent descriptor of microarray compendia. In addition, they highlight a straightforward correlation-based method for TRNI with demonstrated meaningful statistical testing for significant edges, readily applicable to compendia from any species, even when a truth set is not available. This work facilitates a more refined approach to construction and utilization of mRNA expression compendia in TRNI.

Highlights

Transcriptional regulatory network inference (TRNI) from large compendia of DNA microarrays has become a fundamental approach for discovering transcription factor (TF)-gene interactions at the genome-wide level
With the availability of genome-wide mRNA expression data from DNA microarray experiments, transcriptional regulatory network inference (TRNI) from large compendia of these microarrays has become a fundamental task in computational systems biology
We found that peff = 14.66 for this compendium, a drastic reduction compared to the number of genes p = 4298

Summary

Introduction

Transcriptional regulatory network inference (TRNI) from large compendia of DNA microarrays has become a fundamental approach for discovering transcription factor (TF)-gene interactions at the genome-wide level. In correlation-based TRNI, network edges can in principle be evaluated using standard statistical tests While such tests nominally assume independent microarray experiments, we expect dependency between the experiments in microarray compendia, due to both project-specific factors (e.g., microarray preparation, environmental effects) in the multi-project compendium setting and effective dependency induced by gene-gene correlations. With the availability of genome-wide mRNA expression data from DNA microarray experiments, transcriptional regulatory network inference (TRNI) from large compendia of these microarrays has become a fundamental task in computational systems biology In this approach, transcription factor (TF)-gene interactions are predicted. While many of these approaches have relied on userdefined or truth set-based thresholds for determining the network, the correlation- and partial correlationbased methods can in principle calibrate established tests to a desired level of prediction accuracy via control of the false discovery rate (FDR) alone Such tests nominally assume independent and identically distributed (i.i.d.) microarray experiments. Such (effective) dependency invalidates the assumption of i.i.d. experiments upon which the statistical tests are based, thereby complicating the calibration of these tests

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 9, 2010
Citations: 35	License type: cc-by

R Discovery Prime

R Discovery Prime

On the Choice and Number of Microarrays for Transcriptional Regulatory Network Inference

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Unraveling Inflammatory Responses using Systems Genetics and Gene-Environment Interactions in Macrophages
Luz D Orozco ... Aldons J Lusis
Cell | VOL. 151
Luz D Orozco, et. al.Luz D Orozco ... Aldons J Lusis
01 Oct 2012
Cell | VOL. 151

CE4 ANALYSIS OF FACTORS INFLUENCING ACCEPTANCE OF DATA FROM MATCHING-ADJUSTED INDIRECT COMPARISONS BY NICE
K Lach ... N Smith
Value in Health | VOL. 23
K Lach, et. al.K Lach ... N Smith
01 May 2020
Value in Health | VOL. 23

Understanding sample size: what determines the required number of microarrays for an experiment?
Tommy S Jørstad ... Atle M Bones
Trends in Plant Science | VOL. 12
Tommy S Jørstad, et. al.Tommy S Jørstad ... Atle M Bones
16 Jan 2007
Trends in Plant Science | VOL. 12

Experimental and Statistical Considerations to Avoid False Conclusions in Proteomics Studies Using Differential In-gel Electrophoresis
Natasha A Karp ... Kathryn S Lilley
Molecular & Cellular Proteomics | VOL. 6
Natasha A Karp, et. al.Natasha A Karp ... Kathryn S Lilley
01 Aug 2007
Molecular & Cellular Proteomics | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Choice and Number of Microarrays for Transcriptional Regulatory Network Inference

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics