Abstract

BackgroundGene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values.ResultsWe describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements. The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae.ConclusionPBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence, more accurate prediction of the underlying GRN compared to when using the conventional Gaussian approximation (GA) filters ignoring the missing data points.Electronic supplementary materialThe online version of this article (doi:10.1186/s13637-016-0055-8) contains supplementary material, which is available to authorized users.

Highlights

  • Gene regulation happens to be one of the most important processes that take place in living cells [1, 2]

  • Benchmarking is done by counting the number of links correctly predicted by the algorithm, the number of incorrectly predicted links, the number of true links missed in the inferred network, and the number of correctly identified non-existing links

  • The following performance metrics will be defined : true positive rate or recall known as the sensitivity (TPR = True positive (TP)/(TP+False negative (FN))), positive predictive value or precision (PPV = TP/(TP+False positive (FP))), and false positive rate (FPR = FP/(FP+True negative (TN)), where specificity = 1-False positive rate (FPR))

Read more

Summary

Introduction

Gene regulation happens to be one of the most important processes that take place in living cells [1, 2] It includes controls over the transcription of messenger RNA (mRNA) and the eventual translation of mRNA into protein via gene regulatory networks (GRNs). The data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. Till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call