Abstract
Boolean-based method, despite of its simplicity, would be a more attractive approach for inferring a network from high-throughput expression data if its effectiveness has not been limited by high false positive prediction. In this study, we explored factors that could simply be adjusted to improve the accuracy of inferring networks. Our work focused on the analysis of the effects of discretisation methods, biological constraints, and stringency of Boolean function assignment on the performance of Boolean network, including accuracy, precision, specificity and sensitivity, using three sets of microarray time-series data. The study showed that biological constraints have pivotal influence on the network performance over the other factors. It can reduce the variation in network performance resulting from the arbitrary selection of discretisation methods and stringency settings. We also presented the master Boolean network as an approach to establish the unique solution for Boolean analysis. The information acquired from the analysis was summarised and deployed as a general guideline for an efficient use of Boolean-based method in the network inference. In the end, we provided an example of the use of such a guideline in the study of Arabidopsis circadian clock genetic network from which much interesting biological information can be inferred.
Highlights
Genetic network inference has been a widely studied research area since systems biology was introduced to unwind the complex regulation underlying biological systems
The former dataset was downloaded from http:// cmgm.stanford.edu/pbrown/explore/, where the full description of the data was found. It is seven-datapoint time-series data exhibiting the response of galactose utilisation pathway to the decline of glucose concentration (19, 18.7, 17.6, 14, 7.5, 0.2, and 0 g/l). For the latter two datasets, the Affymetrix CEL files were downloaded from NCBI database: twelve-datapoint time-series data is a measurement of circadian clock under continuous light (LL) condition [14], while sixdatapoint time-series data is a measurement of such system under light/dark cycle (LD) condition [15]
We showed that the better network performance is a consequence of the reduction of false positive (FP) prediction and the elevation of true negative (TN) prediction (Figure S4)
Summary
Genetic network inference has been a widely studied research area since systems biology was introduced to unwind the complex regulation underlying biological systems. In contrast with the aforementioned groups whose analyses rely on the experimentally measured data, reverse engineering algorithm which is facultative as the in silico-based method carry out the computation based upon the in silico dataset given by the model. It is, in principle, an exploitation of the model simulation to generate the data of interest for the analysis. This method may be advantageous only for the relatively small scale datasets, because simulation of a large system is technically not easy and often faces computationally infeasible situation [3,8]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have