Abstract

Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.

Highlights

  • Based on descriptions provided by participants, the methods were classified into six categories: Regression, Mutual information, Correlation, Bayesian networks, Meta, and Other (Table 1)

  • We used three gold standards for performance evaluation: experimentally validated interactions from a curated database (RegulonDB16) for E. coli; a high-confidence set of interactions supported by genome-wide transcription factor binding data[17] (ChIP-chip) and evolutionarily conserved binding motifs[18] for S. cerevisiae; and the known network for the in silico dataset (Methods)

  • We assessed method performance for the E. coli, S. cerevisiae, and in silico datasets using the area under the precision-recall (AUPR) and receiver operating characteristic (AUROC) curves[14], and an overall score that summarizes the performance across the three networks (Methods and Supplementary Note 4)

Read more

Summary

Introduction

“The wisdom of crowds,” refers to the phenomenon in which the collective knowledge of a community is greater than the knowledge of any individual[1]. Genome-scale inference of transcriptional gene regulation has become possible with the advent of high-throughput technologies such as microarrays and RNA sequencing, as they provide snapshots of the transcriptome under many tested experimental conditions From these data, the challenge is to computationally predict direct regulatory interactions between a transcription factor and its target genes; the aggregate of all predicted interactions comprise the gene regulatory network. A wide range of network inference methods have been developed to address this challenge, from those exclusive to gene expression data[2,3] to methods that integrate multiple classes of data[4,5,6,7] These approaches have been successfully used to address many biological problems[8,9,10,11], yet when applied to the same data, they can generate quite disparate sets of predicted interactions[2,3]. Dream.broadinstitute.org), which allows researchers to apply top performing inference methods and construct consensus networks

Results
Discussion
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call