E-MAGMA: an eQTL-informed method to identify risk genes using genome-wide association study summary statistics.

Zachary F Gerring,Angela Mina-Vargas,Eske M Derks,Eric R Gamazon,Alfonso Valencia

doi:10.1093/bioinformatics/btab115

Abstract

Genome-wide association studies have successfully identified multiple independent genetic loci that harbour variants associated with human traits and diseases, but the exact causal genes are largely unknown. Common genetic risk variants are enriched in non-protein-coding regions of the genome and often affect gene expression (expression quantitative trait loci, eQTL) in a tissue-specific manner. To address this challenge, we developed a methodological framework, E-MAGMA, which converts genome-wide association summary statistics into gene-level statistics by assigning risk variants to their putative genes based on tissue-specific eQTL information. We compared E-MAGMA to three eQTL informed gene-based approaches using simulated phenotype data. Phenotypes were simulated based on eQTL reference data using GCTA for all genes with at least one eQTL at chromosome 1. We performed 10 simulations per gene. The eQTL-h2 (i.e. the proportion of variation explained by the eQTLs) was set at 1%, 2% and 5%. We found E-MAGMA outperforms other gene-based approaches across a range of simulated parameters (e.g. the number of identified causal genes). When applied to genome-wide association summary statistics for five neuropsychiatric disorders, E-MAGMA identified more putative candidate causal genes compared to other eQTL-based approaches. By integrating tissue-specific eQTL information, these results show E-MAGMA will help to identify novel candidate causal genes from genome-wide association summary statistics and thereby improve the understanding of the biological basis of complex disorders. A tutorial and input files are made available in a github repository: https://github.com/eskederks/eMAGMA-tutorial. Supplementary data are available at Bioinformatics online.

Full Text