Abstract

BackgroundThe use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations not only confound direct and indirect associations but also provide no means to distinguish between cause and effect. For "causal" analysis typically the inference of a directed graphical model is required. However, this is rather difficult due to the curse of dimensionality.ResultsWe propose a simple heuristic for the statistical learning of a high-dimensional "causal" network. The method first converts a correlation network into a partial correlation graph. Subsequently, a partial ordering of the nodes is established by multiple testing of the log-ratio of standardized partial variances. This allows identifying a directed acyclic causal network as a subgraph of the partial correlation network. We illustrate the approach by analyzing a large Arabidopsis thaliana expression data set.ConclusionThe proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations. Nevertheless, for small samples and for sparse networks the algorithm not only yield sensible first order approximations of the causal structure in high-dimensional genomic data but is also computationally highly efficient.Availability and RequirementsThe method is implemented in the "GeneNet" R package (version 1.2.0), available from CRAN and from . The software includes an R script for reproducing the network analysis of the Arabidopsis thaliana data.

Highlights

  • The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations confound direct and indirect associations and provide no means to distinguish between cause and effect

  • The proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations

  • Interpretation of the resulting graph The above algorithm returns a partially directed partial correlation graph, whose directed edges form a causal network. This procedure can be motivated by the following connection between partial correlation graph and a system of linear equations, where each node is in turn taken as a response variable and regressed against all other remaining nodes

Read more

Summary

Introduction

The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations confound direct and indirect associations and provide no means to distinguish between cause and effect. For shedding light on the causal processes underlying the observed data, correlation networks are only of limited use This is due to the fact that correlations confound direct and indirect associations and provide no means to distinguish between response variables and covariates (and between cause and effect). Causal analysis requires tools different from correlation networks: much of the work in this area has focused on Bayesian networks [9] or related regression models such as systems of recursive equations [10,11] or influence diagrams [12] All of these models have in common that they describe causal relations by an underlying directed acyclic graph (DAG). The data that would be most interesting to explore with causal methods, namely those commonly visualized by correlation networks (see above), have completely different characteristics, in particular they are likely of high dimension

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.