Abstract

The current paradigm of genomic studies of complex diseases is association and correlation analysis. Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), the identified genetic variants by GWAS can only explain a small proportion of the heritability of complex diseases. A large fraction of genetic variants is still hidden. Association analysis has limited power to unravel mechanisms of complex diseases. It is time to shift the paradigm of genomic analysis from association analysis to causal inference. Causal inference is an essential component for the discovery of mechanism of diseases. This paper will review the major platforms of the genomic analysis in the past and discuss the perspectives of causal inference as a general framework of genomic analysis. In genomic data analysis, we usually consider four types of associations: association of discrete variables (DNA variation) with continuous variables (phenotypes and gene expressions), association of continuous variables (expressions, methylations, and imaging signals) with continuous variables (gene expressions, imaging signals, phenotypes, and physiological traits), association of discrete variables (DNA variation) with binary trait (disease status) and association of continuous variables (gene expressions, methylations, phenotypes, and imaging signals) with binary trait (disease status). In this paper, we will review algorithmic information theory as a general framework for causal discovery and the recent development of statistical methods for causal inference on discrete data, and discuss the possibility of extending the association analysis of discrete variable with disease to the causal analysis for discrete variable and disease.

Highlights

  • By February 6th, 2017, a catalog of published genomewide association studies (GWAS) had reported significant association of 26,791 SNPs with more than 1,704 traits in 2,337 publications (A catalog of Published Genome-Wide Association Studies, 2017)1

  • We suggest to use Sharron entropy H1(E) and H1(E ) as a dependence measure (DM) in the additive noise models (ANMs) and compare H1(E) with H1(E )

  • We use simulation experiments that were presented in Liu and Chan (2016) to compare the performance of three methods: ANM, Distance correlation and entropy for causal inference with discrete variables

Read more

Summary

Introduction

By February 6th, 2017, a catalog of published genomewide association studies (GWAS) had reported significant association of 26,791 SNPs with more than 1,704 traits in 2,337 publications (A catalog of Published Genome-Wide Association Studies, 2017)1. In contrast to classical statistics where the relationships between random variables are measured by statistical dependence or association, the algorithms for causal inference that are designed to discover the data generating processes based on statistical observations have been developed.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call