Abstract

BackgroundThe regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. Deciphering genome-wide networks that capture which transcription factors regulate which genes is one of the major efforts towards understanding and accurate modeling of living systems. However, reverse-engineering the network from gene expression profiles remains a challenge, because the data are noisy, high dimensional and sparse, and the regulation is often obscured by indirect connections.ResultsWe introduce a gene regulatory network inference algorithm ENNET, which reverse-engineers networks of transcriptional regulation from a variety of expression profiles with a superior accuracy compared to the state-of-the-art methods. The proposed method relies on the boosting of regression stumps combined with a relative variable importance measure for the initial scoring of transcription factors with respect to each gene. Then, we propose a technique for using a distribution of the initial scores and information about knockouts to refine the predictions. We evaluated the proposed method on the DREAM3, DREAM4 and DREAM5 data sets and achieved higher accuracy than the winners of those competitions and other established methods.ConclusionsSuperior accuracy achieved on the three different benchmark data sets shows that ENNET is a top contender in the task of network inference. It is a versatile method that uses information about which gene was knocked-out in which experiment if it is available, but remains the top performer even without such information. ENNET is available for download from https://github.com/slawekj/ennet under the GNU GPLv3 license.

Highlights

  • The regulation of gene expression by transcription factors is a key determinant of cellular phenotypes

  • We evaluated the accuracy of the methods using the Overall Score metric proposed by the authors of Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenges [35], as shown in Equation 6: Overall where paupr and pauroc are geometric means of p-values of networks constituting each DREAM challenge, relating to an area under the Precision-Recall curve (AUPR) and an area under the ROC curve (AUROC), respectively

  • Some of the methods were designed for use with knockout data, while others are developed with multifactorial data in mind, where no information is given about the nature of the perturbations

Read more

Summary

Introduction

The regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. High throughput techniques allow for collecting genome-wide snapshots of gene expression across different experiments, such as diverse treatments or other perturbations to cells [2] Analyzing these data to infer the regulatory network is one of the key challenges in the computational systems biology. Most existing computational methods are examples of influence modeling, where the expression of a target transcript is modeled as a function of the expression levels of some selected transcription factors Such a model does not aim to describe physical interactions between molecules, but instead uses inductive reasoning to find a network of dependencies that could explain the regularities observed among the expression data. This knowledge is crucial to prioritize detailed studies of the mechanics of the transcriptional regulation

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.