The stochastic kinetics of biochemical reaction networks is described by a chemical master equation (CME) and the underlying laws of mass action. Assuming network-free simulations of the rule-based models of biochemical reaction networks (BRNs), this paper departs from the usual analysis of network dynamics as the time-dependent distributions of chemical species counts, and instead considers statistically evaluating the sequences of reaction events generated from the stochastic simulations. The reaction event-time series can be used for reaction clustering, identifying rare events, and recognizing the periods of increased or steady-state activity. However, the main aim of this paper is to device an effective method for identifying causally and anti-causally related sub-sequences of reaction events using their empirical probabilities. This allows discovering some of the causal dynamics of BRNs as well as uncovering their short-term deterministic behaviors. In particular, it is proposed that the reaction sub-sequences that are conditionally nearly certain or nearly uncertain can be considered as being causally related. Moreover, since the time-ordering of reaction events is locally irrelevant, the reaction sub-sequences can be transformed into the reaction sets or multi-sets. The distance metrics can be then used to define the equivalences among the reaction events. The proposed method for identifying the causally related reaction sub-sequences has been implemented as a computationally efficient query-response mechanism. The method was evaluated for five models of genetic networks in seven defined numerical experiments. The models were simulated in BioNetGen using the open-source network-free simulator NFsim. This simulator had to be modified first to allow recording the traces of reaction events, and it is available in the Github repository, ploskot/nfsim_1.20. The generated event time-series were analyzed with Python and Matlab scripts. The whole process of data generation, analysis and visualization has been nearly fully automated using shell scripts. This demonstrates the opportunities for substantially increasing the research productivity by creating automated data generation and processing pipelines.
Read full abstract