Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge

Florian Geier,Jens Timmer,Christian Fleck

doi:10.1186/1752-0509-1-11

Abstract

BackgroundCellular processes are controlled by gene-regulatory networks. Several computational methods are currently used to learn the structure of gene-regulatory networks from data. This study focusses on time series gene expression and gene knock-out data in order to identify the underlying network structure. We compare the performance of different network reconstruction methods using synthetic data generated from an ensemble of reference networks. Data requirements as well as optimal experiments for the reconstruction of gene-regulatory networks are investigated. Additionally, the impact of prior knowledge on network reconstruction as well as the effect of unobserved cellular processes is studied.ResultsWe identify linear Gaussian dynamic Bayesian networks and variable selection based on F-statistics as suitable methods for the reconstruction of gene-regulatory networks from time series data. Commonly used discrete dynamic Bayesian networks perform inferior and this result can be attributed to the inevitable information loss by discretization of expression data. It is shown that short time series generated under transcription factor knock-out are optimal experiments in order to reveal the structure of gene regulatory networks. Relative to the level of observational noise, we give estimates for the required amount of gene expression data in order to accurately reconstruct gene-regulatory networks. The benefit of using of prior knowledge within a Bayesian learning framework is found to be limited to conditions of small gene expression data size. Unobserved processes, like protein-protein interactions, induce dependencies between gene expression levels similar to direct transcriptional regulation. We show that these dependencies cannot be distinguished from transcription factor mediated gene regulation on the basis of gene expression data alone.ConclusionCurrently available data size and data quality make the reconstruction of gene networks from gene expression data a challenge. In this study, we identify an optimal type of experiment, requirements on the gene expression data quality and size as well as appropriate reconstruction methods in order to reverse engineer gene regulatory networks from time series data.

Highlights

Cellular processes are controlled by gene-regulatory networks
We identify linear Gaussian dynamic Bayesian networks and variable selection based on F-statistics as suitable methods for the reconstruction of gene-regulatory networks from time series data
It is shown that short time series generated under transcription factor knock-out are optimal experiments in order to reveal the structure of gene regulatory networks

Summary

Introduction

Several computational methods are currently used to learn the structure of gene-regulatory networks from data. This study focusses on time series gene expression and gene knock-out data in order to identify the underlying network structure. The impact of prior knowledge on network reconstruction as well as the effect of unobserved cellular processes is studied. In the last years numerous methods have been developed and applied to reconstruct the structure and dynamic rules of gene-regulatory networks from different high-throughput data sources, mainly microarray based gene expression analysis, promotor sequence information, chromatin immunoprecipitation (ChIP) and protein-protein interaction assays [3,4,5,6]. In this study we focus on the following three challenges that a reconstruction of gene-regulatory networks from time series of gene expression data is facing

Methods

Results

Conclusion