Abstract

Understanding the causal relationships between variables is a central goal of many scientific inquiries. Causal relationships may be represented by directed edges in a graph (or equivalently, a network). In biology, for example, gene regulatory networks may be viewed as a type of causal networks, where X→Y represents gene X regulating (i.e., being causal to) gene Y. However, existing general-purpose graph inference methods often result in a high number of false edges, whereas current causal inference methods developed for observational data in genomics can handle only limited types of causal relationships. We present MRPC (a PC algorithm with the principle of Mendelian Randomization), an R package that learns causal graphs with improved accuracy over existing methods. Our algorithm builds on the powerful PC algorithm (named after its developers Peter Spirtes and Clark Glymour), a canonical algorithm in computer science for learning directed acyclic graphs. The improvements in MRPC result in increased accuracy in identifying v-structures (i.e., X→Y←Z), and robustness to how the nodes are arranged in the input data. In the special case of genomic data that contain genotypes and phenotypes (e.g., gene expression) at the individual level, MRPC incorporates the principle of Mendelian randomization as constraints on edge direction to help orient the edges. MRPC allows for inference of causal graphs not only for general purposes, but also for biomedical data where multiple types of data may be input to provide evidence for causality. The R package is available on CRAN and is a free open-source software package under a GPL (≥2) license.

Highlights

  • Graphical models provide a powerful mathematical framework to represent dependence among variables

  • Existing methods for inference of Directed Acyclic Graph (DAG) or the equivalent classes fall into three broad classes (Scutari, 2010) (i) constraintbased methods (Tsamardinos et al, 2003; Kalisch and Bühlmann, 2007; Colombo and Maathuis, 2014), which perform statistical tests of marginal and conditional independence for pairs of nodes; (ii) scored-based methods (Peters et al, 2011; Mooij et al, 2016; Nowzohour and Bühlmann, 2016), which optimize the search according to a score function; and (iii) hybrid methods (Tsamardinos et al, 2006) that combine the former two approaches

  • (b) Adjusted Structural Hamming Distance: The SHD, as implemented in pcalg and bnlearn, counts how many differences exist between two directed graphs

Read more

Summary

INTRODUCTION

Graphical models provide a powerful mathematical framework to represent dependence among variables. The canonical causal model (see M1 in Figure 1), X→Y→Z, where X is the instrumental variable, Y the exposure and Z the outcome, underlies most of the existing causal inference methods for genomic data based on the PMR (e.g., Didelez and Sheehan, 2007; Lawlor et al, 2008; Millstein et al, 2009; Smith and Hemani, 2014; Millstein et al, 2016; Wang and Michoel, 2017; Yang et al, 2017; Hemani et al, 2018; Verbanck et al, 2018; Howey et al, 2020; Zhao et al, 2020) Whereas these methods use the genetic variant as the instrumental variable to account for unobserved confounding, we assume causal sufficiency, i.e., confounding variables are fully observed and may be incorporated into the network inference (Spirtes et al, 2000). Our package further provides alternative approaches to graph visualization and graph comparison that are unavailable in the bnlearn and pcalg packages

METHOD
RESULTS
DISCUSSION
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.