Abstract

Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.

Highlights

  • The challenges of precise identification of disease-causing variants underlying genome-wide association study (GWAS) signals have recently received much attention [1,2,3]

  • Current next generation sequencing (NGS) or imputation-based studies of either the whole genome or regions previously identified by GWAS have not yet been very successful in identifying causal variants

  • We show that various common factors, such as differential sequencing or imputation accuracy rates and linkage disequilibrium patterns, with or without GWAS-informed region selection, can substantially decrease the probability of identifying the correct causal SNP, often by more than half

Read more

Summary

Introduction

The challenges of precise identification of disease-causing variants underlying GWAS signals have recently received much attention [1,2,3]. For post-GWAS statistical analysis that aims to accurately identify potentially causal variants, a major hurdle is the development of methods to distinguish disease-causing variants from their highly-correlated proxies. While GWAS-era statistical methods focused on identifying associated regions via tag SNPs at the coarse scale of GWAS arrays, generation sequencing (NGS) technology offers the capability to detect associated regions, but to distinguish the causal SNPs within these associated regions. GWAS and imputation studies typically report the top-ranked SNP for each associated locus, and follow-up studies typically attempt replication for these topranked SNPs (for further discussion of ranking see Text S1)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.