Novel Variance-Component TWAS method for studying complex human diseases with applications to Alzheimer's dementia.

Shizhen Tang,Jingjing Yang,Michael P Epstein,Aron S Buchman,David A Bennett,Philip L De Jager,Lin Chen

doi:10.1371/journal.pgen.1009482

Shizhen Tang, Jingjing Yang + Show 5 more

Open Access

https://doi.org/10.1371/journal.pgen.1009482

Copy DOI

Abstract

Transcriptome-wide association studies (TWAS) have been widely used to integrate transcriptomic and genetic data to study complex human diseases. Within a test dataset lacking transcriptomic data, traditional two-stage TWAS methods first impute gene expression by creating a weighted sum that aggregates SNPs with their corresponding cis-eQTL effects on reference transcriptome. Traditional TWAS methods then employ a linear regression model to assess the association between imputed gene expression and test phenotype, thereby assuming the effect of a cis-eQTL SNP on test phenotype is a linear function of the eQTL's estimated effect on reference transcriptome. To increase TWAS robustness to this assumption, we propose a novel Variance-Component TWAS procedure (VC-TWAS) that assumes the effects of cis-eQTL SNPs on phenotype are random (with variance proportional to corresponding reference cis-eQTL effects) rather than fixed. VC-TWAS is applicable to both continuous and dichotomous phenotypes, as well as individual-level and summary-level GWAS data. Using simulated data, we show VC-TWAS is more powerful than traditional TWAS methods based on a two-stage Burden test, especially when eQTL genetic effects on test phenotype are no longer a linear function of their eQTL genetic effects on reference transcriptome. We further applied VC-TWAS to both individual-level (N = ~3.4K) and summary-level (N = ~54K) GWAS data to study Alzheimer's dementia (AD). With the individual-level data, we detected 13 significant risk genes including 6 known GWAS risk genes such as TOMM40 that were missed by traditional TWAS methods. With the summary-level data, we detected 57 significant risk genes considering only cis-SNPs and 71 significant genes considering both cis- and trans- SNPs, which also validated our findings with the individual-level GWAS data. Our VC-TWAS method is implemented in the TIGAR tool for public use.

Highlights

Genome-wide association studies (GWAS) have succeeded in identifying thousands of genetic loci associated with complex traits and diseases [1,2,3]
Our simulation studies showed that VC-Transcriptome-wide association studies (TWAS) achieved higher power compared to traditional TWAS methods based on a two-stage Burden test, when the underlying assumptions required by traditional TWAS tools were violated
With individual-level data, we detected 13 significant risk genes including 6 known GWAS risk genes such as TOMM40 that were missed by traditional TWAS methods

Summary

Introduction

Genome-wide association studies (GWAS) have succeeded in identifying thousands of genetic loci associated with complex traits and diseases [1,2,3]. Traditional two-stage TWAS tools such as PrediXcan [10], FUSION [11], and TIGAR [8] employ different regression methods to fit such gene expression imputation models, where corresponding cis-eQTL effect sizes are derived. PrediXcan method [10] implements the Elastic-Net penalized regression method [12] while TIGAR [8] implements the nonparametric Bayesian Dirichlet process regression (DPR) [13] method Regardless of technique, these existing two-stage TWAS methods produce a set of estimates of cis-eQTL effects on reference transcriptome. Once created, existing two-stage TWAS tools test for association between the imputed GReX and traits of interest using GWAS data, where the test is based on a single variant linear regression model. TWAS methods have been deployed widely since their inception and have had substantial success in improving our understanding the genetic regulation of various complex traits [14]

Methods

Results

Discussion

Conclusion