Abstract
Major progress in disease genetics has been made through genome-wide association studies (GWASs). One of the key tasks for post-GWAS analyses is to identify causal noncoding variants with regulatory function. Here, on the basis of >2000 functional features, we developed a convolutional neural network framework for combinatorial, nonlinear modeling of complex patterns shared by risk variants scattered among multiple associated loci. When applied for major psychiatric disorders and autoimmune diseases, neural and immune features, respectively, exhibited high explanatory power while reflecting the pathophysiology of the relevant disease. The predicted causal variants were concentrated in active regulatory regions of relevant cell types and tended to be in physical contact with transcription factors while residing in evolutionarily conserved regions and resulting in expression changes of genes related to the given disease. We demonstrate some examples of novel candidate causal variants and associated genes. Our method is expected to contribute to the identification and functional interpretation of potential causal noncoding variants in post-GWAS analyses.
Highlights
During the last decade, numerous efforts have been made to elucidate the genetic mechanisms underlying complex disorders
We developed a deep learning framework based on convolutional neural networks (CNNs) to discover regulatory variants that may play a causative role in increasing the risk of the five major psychiatric disorders and four autoimmune diseases: autism spectrum disorder (ASD), attention deficit-hyperactivity disorder (ADHD), bipolar disorder (BPD), major depressive disorder (MDD), schizophrenia (SCZ), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), Crohn’s disease (CD), and ulcerative colitis (UC)
Our CNN model was trained on the feature vectors across multiple association blocks (Figure 1a and Supplementary Figure 1)
Summary
Numerous efforts have been made to elucidate the genetic mechanisms underlying complex disorders. Major progress was made through genome-wide association studies (GWASs). Developing methods to pinpoint the DNA variants that increase the risk of the associated disease is a major challenge that GWASs still face [1]. GWASs cannot pinpoint causal disease variants but can only report linkage disequilibrium (LD) blocks including many neutral SNPs linked to causal loci. The majority of disease-associated DNA variations are thought to alter not the gene itself but the regulation of gene expression [2]. Our incomplete knowledge of noncoding regions limits the functional interpretation of underlying DNA variants. The wealth of cell-type-specific human epigenomes help with the identification of functional noncoding variants [1, 3,4,5]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have