Abstract
Although large amounts of genomic data are available, it remains a challenge to reliably infer causal (i. e., regulatory) relationships among molecular phenotypes (such as gene expression), especially when multiple phenotypes are involved. We extend the interpretation of the Principle of Mendelian randomization (PMR) and present MRPC, a novel machine learning algorithm that incorporates the PMR in the PC algorithm, a classical algorithm for learning causal graphs in computer science. MRPC learns a causal biological network efficiently and robustly from integrating individual-level genotype and molecular phenotype data, in which directed edges indicate causal directions. We demonstrate through simulation that MRPC outperforms several popular general-purpose network inference methods and PMR-based methods. We apply MRPC to distinguish direct and indirect targets among multiple genes associated with expression quantitative trait loci. Our method is implemented in the R package MRPC, available on CRAN (https://cran.r-project.org/web/packages/MRPC/index.html).
Highlights
Experiments have been conducted to understand the causal relationships among genes (Segal et al, 2003; Housden et al, 2013), or between an expression Quantitative Trait Locus and its direct and indirect target genes (Cheung and Spielman, 2009)
When we examine relationships among gene expression or other molecular phenotypes, it is usually not known beforehand which of T1 and T2 is more likely to be the outcome of the other, and Model 1 alone does not have the flexibility of examining additional possibilities
We examined the association between each of the top 10 pc function: the default (PC) and the expression Quantitative Trait Locus (eQTL)-gene sets, identified statistically significant associations, and applied MRPC jointly for the eQTL-gene set and the associated PC
Summary
Experiments (e.g., temporal transcription or protein expression assays, gene knockouts or knockdowns) have been conducted to understand the causal relationships among genes (Segal et al, 2003; Housden et al, 2013), or between an expression Quantitative Trait Locus (eQTL) and its direct and indirect target genes (Cheung and Spielman, 2009). It is even harder to learn (i.e., infer) a causal network of multiple genes, which may represent which genes regulate which other genes (Hill et al, 2016; Ahmed et al, 2018). We address this problem in this paper.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have