Abstract

The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.

Highlights

  • The mutual information (MI) is a measure to quantify the nonlinear dependency between two random variables [1,2]

  • In this study we presented a comprehensive investigation of the influence the MI estimators have on the inference performance of C3NET

  • We observed a strong influence of the MI estimators and the discretization methods on the inference performance, revealed by global and local network-based error measures

Read more

Summary

Introduction

The mutual information (MI) is a measure to quantify the nonlinear dependency between two random variables [1,2]. The most popular strategies for estimating mutual information values are based on a discretized model for continuous data [3] This strategy is widely known as the histogram approach that approximates the joint probability distribution by their empirical joint frequencies in bins of the two discretized random variables [1]. It has been shown that the Empirical estimator underestimates the entropy due to an undersampling of cell frequencies and of zero cell frequencies which increase with the number of bins [4]. This is a major problem for practical applications due to finite data and the requirement for a large number of bins for accurate estimates. To account for the induced bias of the MI estimate, a variety of methods were developed that adjust the estimate by a constant factor [3], use a shrinkage regularization [5] or employ a Bayesian approach to estimate the joint frequencies for the bins from a Dirichlet distribution [6] to gain more accurate estimates

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call