Abstract
A central focus of complex disease genetics after genome-wide association studies (GWAS) is to identify low frequency and rare risk variants, which may account for an important fraction of disease heritability unexplained by GWAS. A profusion of studies using next-generation sequencing are seeking such risk alleles. We describe how already-known complex trait loci (largely from GWAS) can be used to guide the design of these new studies by selecting cases, controls, or families who are most likely to harbor undiscovered risk alleles. We show that genetic risk prediction can select unrelated cases from large cohorts who are enriched for unknown risk factors, or multiply-affected families that are more likely to harbor high-penetrance risk alleles. We derive the frequency of an undiscovered risk allele in selected cases and controls, and show how this relates to the variance explained by the risk score, the disease prevalence and the population frequency of the risk allele. We also describe a new method for informing the design of sequencing studies using genetic risk prediction in large partially-genotyped families using an extension of the Inside-Outside algorithm for inference on trees. We explore several study design scenarios using both simulated and real data, and show that in many cases genetic risk prediction can provide significant increases in power to detect low-frequency and rare risk alleles. The same approach can also be used to aid discovery of non-genetic risk factors, suggesting possible future utility of genetic risk prediction in conventional epidemiology. Software implementing the methods in this paper is available in the R package Mangrove.
Highlights
Risk for many complex diseases [1,2] can be partially predicted by a variety of proven risk factors, both genetic and environmental
We have presented a new method for genetic risk prediction using known genome-wide association studies (GWAS)-type risk variants to inform the design of sequencing studies aimed at finding low-frequency or rare risk alleles
The power increase is greatest when it is possible to sample from a large pool of potential individuals or families for sequencing, and when a substantial fraction of heritability is explained by known GWAS loci
Summary
Risk for many complex diseases [1,2] can be partially predicted by a variety of proven risk factors, both genetic and environmental. While the clinical utility of these predictors is widely debated [3], the potential to use them in the design of future research has been less well studied. While it is widely known that genome-wide association studies (GWAS) have explained only a minority of variance in most complex diseases [4], the loci discovered are still able to predict many diseases as well as (or better than) established non-genetic risk factors [5]. Since GWAS were designed to detect common risk variants [6], a natural step is to undertake studies that are able to detect low-frequency and rare risk variants using next-generation sequencing [7,8]. In this paper we investigate the potential power gained by using genetic risk factors established via GWAS in nextgeneration sequencing experiments
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have