Deep post-GWAS analysis identifies potential risk genes and risk variants for Alzheimer\u2019s disease, providing new insights into its disease mechanisms

Zhen Wang,Nha Nguyen,Zhengdong D. Zhang,Jhih-Rong Lin,Joydeep Mitra,Quanwei Zhang,M. Reza Jabalameli

doi:10.1038/s41598-021-99352-3

Zhen Wang, Nha Nguyen + Show 5 more

Open Access

https://doi.org/10.1038/s41598-021-99352-3

Copy DOI

Abstract

Alzheimer’s disease (AD) is a genetically complex, multifactorial neurodegenerative disease. It affects more than 45 million people worldwide and currently remains untreatable. Although genome-wide association studies (GWAS) have identified many AD-associated common variants, only about 25 genes are currently known to affect the risk of developing AD, despite its highly polygenic nature. Moreover, the risk variants underlying GWAS AD-association signals remain unknown. Here, we describe a deep post-GWAS analysis of AD-associated variants, using an integrated computational framework for predicting both disease genes and their risk variants. We identified 342 putative AD risk genes in 203 risk regions spanning 502 AD-associated common variants. 246 AD risk genes have not been identified as AD risk genes by previous GWAS collected in GWAS catalogs, and 115 of 342 AD risk genes are outside the risk regions, likely under the regulation of transcriptional regulatory elements contained therein. Even more significantly, for 109 AD risk genes, we predicted 150 risk variants, of both coding and regulatory (in promoters or enhancers) types, and 85 (57%) of them are supported by functional annotation. In-depth functional analyses showed that AD risk genes were overrepresented in AD-related pathways or GO terms—e.g., the complement and coagulation cascade and phosphorylation and activation of immune response—and their expression was relatively enriched in microglia, endothelia, and pericytes of the human brain. We found nine AD risk genes—e.g., IL1RAP, PMAIP1, LAMTOR4—as predictors for the prognosis of AD survival and genes such as ARL6IP5 with altered network connectivity between AD patients and normal individuals involved in AD progression. Our findings open new strategies for developing therapeutics targeting AD risk genes or risk variants to influence AD pathogenesis.

Highlights

Alzheimer’s disease (AD) is a genetically complex, multifactorial neurodegenerative disease
genome-wide association studies (GWAS) revealed a large number of AD-associated genetic loci (Supplementary Fig. S1 and Supplementary Table S1), including SORL1, ABCA7, CLU, CR1, INPP5D, CD33, BIN1, PICALM, PTK2B, and APOE, a locus that has been repeatedly validated across different s tudies[4]
We aimed to first compile a list of high-confidence AD risk genes derived from association signals, systematically uncover the characteristics of the identified AD risk genes, including the level and variation of their expression in different types of cells, and use a computational framework that we developed to identify putative risk variants connected to AD risk genes

Summary

Introduction

Alzheimer’s disease (AD) is a genetically complex, multifactorial neurodegenerative disease. Two recent meta-analyses of large cohorts of LOAD (n = 455,258 and 94,437) identified 2 95 and 2 56 risk loci, respectively Interpretation of these results, remains elusive, because GWAS only detect statistical associations among a subset of all variants and ~ 86% of AD associated SNPs are non-coding (either intronic or intergenic, Supplementary Fig. S1). To this end, we sought to integrate genomic data from multiple sources—e.g., GWAS signals from the GWAS Catalog, disease genes databases (MalaCards10, DISEASES11, and DisGeNET v5.012), functional annotation of genetic variants (LINSIGHT13, ExPecto[14], and PrimateAI15), and the 1000 Genomes Project—to predict AD risk genes and risk variants. Our results provide novel biological insights into the genetic architecture, expression profiles, functional pathways involved in the AD etiology, and a basis for future therapeutic development for the disease

Objectives

Methods

Results