Abstract

The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The minimum description length (MDL) principle has already been used for inferring genetic regulatory networks from time-series expression data and has proven useful for recovering the directed connections in Boolean networks. However, the existing method uses an ad hoc measure of description length that necessitates a tuning parameter for artificially balancing the model and error costs and, as a result, directly conflicts with the MDL principle's implied universality. In order to surpass this difficulty, we propose a novel MDL-based method in which the description length is a theoretical measure derived from a universal normalized maximum likelihood model. The search space is reduced by applying an implementable analogue of Kolmogorov's structure function. The performance of the proposed method is demonstrated on random synthetic networks, for which it is shown to improve upon previously published network inference algorithms with respect to both speed and accuracy. Finally, it is applied to time-series Drosophila gene expression measurements.

Highlights

  • The modeling of gene regulatory networks is a major focus of systems biology because, depending on the type of modeling, the networks can be used to model interdependencies between genes, to study the dynamics of the underlying genetic regulation, and to provide a basis for the derivation of optimal intervention strategies

  • One can consider the class Gκg composed of all Boolean networks with indegrees bounded by κ

  • We observe that normalized maximum likelihood (NML) minimum description length (MDL) with fixed K performs better over all Boolean functions, invoking the structure function (SF) yields error rates much closer to the fixed K approach when we are restricted to canalizing functions

Read more

Summary

Introduction

The modeling of gene regulatory networks is a major focus of systems biology because, depending on the type of modeling, the networks can be used to model interdependencies between genes, to study the dynamics of the underlying genetic regulation, and to provide a basis for the derivation of optimal intervention strategies. Bayesian networks [1, 2] and dynamic Bayesian networks [3, 4] provide models to elucidate dependency relations; functional networks, such as Boolean networks [5] and probabilistic Boolean networks [6], provide the means to characterize steady-state behavior. The MDL principle balances error (deviation from the data) and model complexity by using a cost function consisting of a sum of entropies, one relative to encoding the error and the other relative to encoding the model description [18]. The Network MDL algorithm often yields good results, but it does so with an ad hoc coding scheme that requires a user-specified tuning parameter We will avoid this drawback by achieving a codelength via a normalized maximum likelihood model. We will improve upon Network MDL’s efficiency by applying an analogue of Kolmogorov’s structure function [21]

Boolean Networks
The MDL Principle
Normalized Maximum Likelihood
Stochastic Complexity
Kolmogorov’s Structure Function
Performance on Simulated Data
Random Networks
Canalizing Networks
Application to Drosophila Data
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call