De-novo learning of genome-scale regulatory networks in S. cerevisiae.

Sisi Ma,Patrick Kemmeren,David Gresham,Alexander Statnikov,Alberto De La Fuente

doi:10.1371/journal.pone.0106479

Abstract

De-novo reverse-engineering of genome-scale regulatory networks is a fundamental problem of biological and translational research. One of the major obstacles in developing and evaluating approaches for de-novo gene network reconstruction is the absence of high-quality genome-scale gold-standard networks of direct regulatory interactions. To establish a foundation for assessing the accuracy of de-novo gene network reverse-engineering, we constructed high-quality genome-scale gold-standard networks of direct regulatory interactions in Saccharomyces cerevisiae that incorporate binding and gene knockout data. Then we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different datasets spanning over 4 data types. We found that most reconstructed networks had statistically significant accuracies. We also determined which statistical approaches and datasets/data types lead to networks with better reconstruction accuracies. While we found that de-novo reverse-engineering of the entire network is a challenging problem, it is possible to reconstruct sub-networks around some transcription factors with good accuracy. The latter transcription factors can be identified by assessing their connectivity in the inferred networks. Overall, this study provides the gene network reverse-engineering community with a rigorous assessment of the accuracy of S. cerevisiae gene network reconstruction and variability in performance of various approaches for learning both the entire network and sub-networks around transcription factors.

Highlights

One of the fundamental problems of modern biology is reverseengineering of genome-scale regulatory networks
If we perform averaging over all statistical approaches and datasets belonging to the same data type, the best accuracy is achieved by observational data due to change in time/environment and by compendium data, followed by perturbation data (0.87) and observational data consisting of biological wild-type replicates (0.88)
We used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different real datasets spanning over 4 data types

Summary

Introduction

One of the fundamental problems of modern biology is reverseengineering of genome-scale regulatory networks. While there are many databases that store biological pathways (e.g., KEGG and Ingenuity Pathway Analysis), these databases are often inaccurate and/or incomplete because their knowledge is derived from a multitude of biological systems and conditions that may not correspond to the problem at hand. Pathways in these databases are affected by variability of the employed computational and experimental methods and their reproducibility characteristics [1,2,3]. While modern methods in biology enable performing such studies in a variety of model systems, they are typically expensive to perform on a genome-scale and often unfeasible in humans

Methods

Results

Discussion

Conclusion