Abstract

Identifying the reactions that govern a dynamical biological system is a crucial but challenging task in systems biology. In this work, we present a data-driven method to infer the underlying biochemical reaction system governing a set of observed species concentrations over time. We formulate the problem as a regression over a large, but limited, mass-action constrained reaction space and utilize sparse Bayesian inference via the regularized horseshoe prior to produce robust, interpretable biochemical reaction networks, along with uncertainty estimates of parameters. The resulting systems of chemical reactions and posteriors inform the biologist of potentially several reaction systems that can be further investigated. We demonstrate the method on two examples of recovering the dynamics of an unknown reaction system, to illustrate the benefits of improved accuracy and information obtained.

Highlights

  • Developments in high-throughput experimental methodologies in biology have enabled the collection of massive amounts of time varying molecular data at small scales

  • We introduce the regularized horseshoe prior [24] used in our Bayesian formulation of the Reactive Sparse Identification of Nonlinear Dynamics (SINDy) model and the modified observational model, which better captures the measurement process and avoids biased, low-order derivative estimates

  • Results are compared to those of Reactive SINDy, to show the ability of our model to obtain a network, with uncertainty estimates, that replicates the observations in addition to demonstrating the superior performance in the case of sparse observations due to the modified observational model

Read more

Summary

Introduction

Developments in high-throughput experimental methodologies in biology have enabled the collection of massive amounts of time varying molecular data at small scales This has resulted in significant advances in understanding the biochemical networks and mechanisms underlying physiological processes such as gene regulation. An appealing avenue is to utilize data-driven approaches for systems identification, whereby plausible biochemical reaction networks are generated and estimated directly from data without the need to initially propose a system. While recently, many such methods have been developed to infer networks from a wide variety of different datasets, it remains a challenging statistical and computational task [4]. Most works of estimating networks typically focus on either reconstructing a network without assuming any known dynamics due to destructive time series measurements [5,6,7], or producing networks that replicate dynamics, but without focusing on interpretability [8,9,10,11,12]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call