Abstract

A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, gene prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.

Highlights

  • We introduce a web application named FUMA that allows to process GWAS summary statistics, and annotate, prioritize SNPs and genes and facilitates interpretation by providing interactive visualizations

  • FUMA provides the rationale for pinpointing this gene, such as for example when the expression of the prioritized gene is altered by a SNP that is associated with the disease of interest

  • We identified additional putative causal genes by performing chromatin interaction mapping on outcomes from three GWAS studies (BMI, Crohn’s disease47 (CD), and SCZ) and the identified genes based on chromatin interaction information were mostly located outside of the risk loci, and were shown to have shared function with known candidates

Read more

Summary

Methods

To compute minor allele frequencies and LD structure, we used the data from the 1000 Genomes Project[27] phase 3 (1000G). Minor allele frequency and r2 of pairwise SNPs (minimum r2 = 0.05 and maximum distance between a pair of SNPs is 1 Mb) were pre-computed using PLINK26 for each of available populations (AFR, AMR, EAS, EUR, and SAS). Cis-eQTL information was obtained from the following four different data repositories; GTEx portal v68, Blood eQTL browser[16], BIOS QTL Browser[17], and BRAINEAC18, and genes were mapped to ensemble gene ID if necessary (Supplementary Note 2). Pre-processed Hi-C data for 14 tissue types and seven cell lines were obtained from GSE8711211 (Supplementary Note 3). Normalized gene expression data (RPKM, Read Per Kilobase per Million) from GTEx portal v68 for 53 tissue types were processed for different purposes. Curated pathways and gene sets from MsigDB v5.221 and WikiPathways[22] which are assigned entrez ID

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call