Abstract

Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.Electronic supplementary materialThe online version of this article (doi:10.1186/s13637-016-0052-y) contains supplementary material, which is available to authorized users.

Highlights

  • Inference of biological networks including gene regulatory, metabolic, and protein-protein interaction networks has received much attention recently

  • We provide a practical guide for researchers when using correlation and partial correlation methods and we believe that understanding these two concepts

  • We analyze the relation between concentration and covariance graphs and further conduct the detailed comparison between various graph reconstruction methods designed to infer concentration as well as covariance graphs

Read more

Summary

Introduction

Inference of biological networks including gene regulatory, metabolic, and protein-protein interaction networks has received much attention recently. Correlation methods that are based on the covariance matrix estimation are widely used in reconstructing gene co-expression and module graphs, especially in large-scale biomedical applications [6,7,8]. Methods based on the concentration or partial correlation matrix allow to infer only direct dependencies between variables In this respect, one can differentiate two graph types resulting from correlation and partial correlation-based methods which we will call covariance and concentration graphs on the following, respectively. Despite the fact that the covariance graph includes indirect dependencies, it is widely used in applications to represent sparse biological graphs by performing simple hard-thresholding [6] or through estimating the covariance matrix with shrinkage methods [9]

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call