From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data

Rainer Opgen-Rhein,Korbinian Strimmer

doi:10.1186/1752-0509-1-37

Rainer Opgen-Rhein, Korbinian Strimmer

Open Access

https://doi.org/10.1186/1752-0509-1-37

Copy DOI

Abstract

BackgroundThe use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations not only confound direct and indirect associations but also provide no means to distinguish between cause and effect. For "causal" analysis typically the inference of a directed graphical model is required. However, this is rather difficult due to the curse of dimensionality.ResultsWe propose a simple heuristic for the statistical learning of a high-dimensional "causal" network. The method first converts a correlation network into a partial correlation graph. Subsequently, a partial ordering of the nodes is established by multiple testing of the log-ratio of standardized partial variances. This allows identifying a directed acyclic causal network as a subgraph of the partial correlation network. We illustrate the approach by analyzing a large Arabidopsis thaliana expression data set.ConclusionThe proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations. Nevertheless, for small samples and for sparse networks the algorithm not only yield sensible first order approximations of the causal structure in high-dimensional genomic data but is also computationally highly efficient.Availability and RequirementsThe method is implemented in the "GeneNet" R package (version 1.2.0), available from CRAN and from . The software includes an R script for reproducing the network analysis of the Arabidopsis thaliana data.

Highlights

The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations confound direct and indirect associations and provide no means to distinguish between cause and effect
The proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations
Interpretation of the resulting graph The above algorithm returns a partially directed partial correlation graph, whose directed edges form a causal network. This procedure can be motivated by the following connection between partial correlation graph and a system of linear equations, where each node is in turn taken as a response variable and regressed against all other remaining nodes

Summary

Introduction

The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations confound direct and indirect associations and provide no means to distinguish between cause and effect. For shedding light on the causal processes underlying the observed data, correlation networks are only of limited use This is due to the fact that correlations confound direct and indirect associations and provide no means to distinguish between response variables and covariates (and between cause and effect). Causal analysis requires tools different from correlation networks: much of the work in this area has focused on Bayesian networks [9] or related regression models such as systems of recursive equations [10,11] or influence diagrams [12] All of these models have in common that they describe causal relations by an underlying directed acyclic graph (DAG). The data that would be most interesting to explore with causal methods, namely those commonly visualized by correlation networks (see above), have completely different characteristics, in particular they are likely of high dimension

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Systems Biology	Publication Date: Aug 6, 2007
Citations: 391	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology

Lead the way for us

Similar Papers

Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
Kang Ning ... Alexey I Nesvizhskii
Journal of Proteome Research | VOL. 11
Kang Ning, et. al.Kang Ning ... Alexey I Nesvizhskii
29 Feb 2012
Journal of Proteome Research | VOL. 11

ForestSubtype: a cancer subtype identifying approach based on high-dimensional genomic data and a parallel random forest
Junwei Luo ... Junfeng Wang
BMC Bioinformatics | VOL. 24
Junwei Luo, et. al.Junwei Luo ... Junfeng Wang
19 Jul 2023
BMC Bioinformatics | VOL. 24

Statistical Analysis of Gene Expression and Genomic Data
Marcos Deon Vilela de Resende ... Luiz Alexandre Peternelli
-
Marcos Deon Vilela de Resende, et. al.Marcos Deon Vilela de Resende ... Luiz Alexandre Peternelli
01 Jan 2015
01 Jan 2015

Integration of multi-omics data for prediction of phenotypic traits using random forest
Animesh Acharjee ... Richard G F Visser
BMC Bioinformatics | VOL. 17
Animesh Acharjee, et. al.Animesh Acharjee ... Richard G F Visser
01 Jun 2016
BMC Bioinformatics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology