Abstract

Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.

Highlights

  • Knowing the gene regulatory network (GRN) in the cell is crucial to understanding the working of the cell

  • The medians are taken over the 20 replicates. st2, st3 and st4 are for score thresholds of 2, 3 and 4 respectively

  • We have tested our algorithm on 3 types of synthetic data: small GRN with one hidden node, small GRN with no hidden node, and large GRN with a small but unknown number of hidden nodes

Read more

Summary

Introduction

Knowing the gene regulatory network (GRN) in the cell is crucial to understanding the working of the cell. [19] first estimates the possible delays from pairwise mutual information from discretized expression data, infer multiple time step DBN by minimizing MDL score using genetic algorithm. [41] is a Granger-causality based method that learns a mixed graph from time series data, where directed edges represent direct causal relationship, and dashed edges represent relationship due to hidden common cause. [46] uses nested effects models using perturbation data with no time delays, and hidden common effect of two observed variables may be predicted, and some edges indicate possible presence of hidden nodes. [51] learns a discrete Bayesian network with hidden variable without time delays It assumes that a hidden variable has observed variables as parents and children. Doi:10.1371/journal.pone.0138596.g001 four, as the smallest semi-clique has size four. [52] complements [51] and focuses on learning the dimensionality (the number of states) of hidden variables

Objective
Methods
Experiment Results
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call