Abstract

De-novo reverse-engineering of genome-scale regulatory networks is a fundamental problem of biological and translational research. One of the major obstacles in developing and evaluating approaches for de-novo gene network reconstruction is the absence of high-quality genome-scale gold-standard networks of direct regulatory interactions. To establish a foundation for assessing the accuracy of de-novo gene network reverse-engineering, we constructed high-quality genome-scale gold-standard networks of direct regulatory interactions in Saccharomyces cerevisiae that incorporate binding and gene knockout data. Then we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different datasets spanning over 4 data types. We found that most reconstructed networks had statistically significant accuracies. We also determined which statistical approaches and datasets/data types lead to networks with better reconstruction accuracies. While we found that de-novo reverse-engineering of the entire network is a challenging problem, it is possible to reconstruct sub-networks around some transcription factors with good accuracy. The latter transcription factors can be identified by assessing their connectivity in the inferred networks. Overall, this study provides the gene network reverse-engineering community with a rigorous assessment of the accuracy of S. cerevisiae gene network reconstruction and variability in performance of various approaches for learning both the entire network and sub-networks around transcription factors.

Highlights

  • One of the fundamental problems of modern biology is reverseengineering of genome-scale regulatory networks

  • If we perform averaging over all statistical approaches and datasets belonging to the same data type, the best accuracy is achieved by observational data due to change in time/environment and by compendium data, followed by perturbation data (0.87) and observational data consisting of biological wild-type replicates (0.88)

  • We used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different real datasets spanning over 4 data types

Read more

Summary

Introduction

One of the fundamental problems of modern biology is reverseengineering of genome-scale regulatory networks. While there are many databases that store biological pathways (e.g., KEGG and Ingenuity Pathway Analysis), these databases are often inaccurate and/or incomplete because their knowledge is derived from a multitude of biological systems and conditions that may not correspond to the problem at hand. Pathways in these databases are affected by variability of the employed computational and experimental methods and their reproducibility characteristics [1,2,3]. While modern methods in biology enable performing such studies in a variety of model systems, they are typically expensive to perform on a genome-scale and often unfeasible in humans

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call