Abstract

The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals (“hits”) to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL), and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher probabilities of causality from functional characteristics showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data.

Highlights

  • Genome-wide association studies (GWAS), which investigate the association between genetic variation and phenotypic traits, have identified many genes associated with human diseases [1]

  • Our analyses indicate that GWAS hits are enriched for most functional characteristics compared to GWAS non-hits, except for splice sites and micro RNA targets, perhaps due to the very low frequency of these two classes of functional characteristics compared to the others (Table 1 and Table 2)

  • Our results confirm previous findings of differences in functional enrichment in GWAS hits compared to non-hits, which provided a rationale for utilizing functional characteristics as predictors of single nucleotide polymorphisms (SNPs) causality

Read more

Summary

Introduction

Genome-wide association studies (GWAS), which investigate the association between genetic variation and phenotypic traits, have identified many genes associated with human diseases [1]. Purcell et al [2] showed that single nucleotide polymorphisms (SNPs) from GWAS with subthreshold p-values account for a considerable proportion of the variance in independent samples suggesting that they are enriched for causal SNPs or their proxies. Emerging experimental data from various sources have suggested that the functional characteristics of specific genomic regions, such as histone modifications, DNase I hypersensitive sites, transcription factor binding sites, and expression quantitative trait loci (eQTL) among others, could offer biological explanations for many variants found to be associated with disease (for example: [3,4,5]). GWAS variants or variants with which they are in perfect LD are more frequently localized to DNase I hypersensitive sites than would be expected by chance [8]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call