Abstract

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.

Highlights

  • Genetic regulatory networks (GRNs) [1] are central to all biological organisms, and their deciphering is crucial to understand the development, functioning and pathology of these organisms

  • In addition to its good performance on the synthetic data of the DREAM4 challenge, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli

  • We focus only on the first task, which is targeted by the evaluation procedure of the DREAM4 challenge

Read more

Summary

Introduction

Genetic regulatory networks (GRNs) [1] are central to all biological organisms, and their deciphering is crucial to understand the development, functioning and pathology of these organisms. Once a remote theoretical possibility, this deciphering is made possible by advances in genomics, most notably highthroughput profiling of gene expression patterns with DNA microarrays These advances have prompted the development of a plethora of models of GRNs and algorithms to reverse-engineer them from expression data [2,3,4,5]. At the other end of the complexity spectrum are physical models mimicking the biological mechanisms at play, including promoter recognition, mRNA transcription and protein translation. These models, typically based on systems of ordinary or stochastic differential equations, can generate realistic behavior [7]. Hybrid methods involving ordinary differential equations have shown good performances on real-life genome-wide GRN inference [8]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call