Abstract

While genome-wide association studies (GWAS) have been successful in identifying a large number of variants associated with disease, the challenge of locating the underlying causal loci remains. Sequencing of case and control DNA pools provides an inexpensive method for assessing all variation in a genomic region surrounding a significant GWAS result. However, individual variants need to be ranked in terms of the strength of their association to disease in order to prioritise follow-up by individual genotyping. A simple method for testing for case-control association in sequence data from DNA pools is presented that allows the partitioning of the variance in allele frequency estimates into components due to the sampling of chromosomes from the pool during sequencing, sampling individuals from the population and unequal contribution from individuals during pool construction. The utility of this method is demonstrated on a sequence from the alcohol dehydrogenase (ADH) gene cluster on a case-control sample for heavy alcohol consumption.

Highlights

  • A large number of genetic associations with disease have been discovered in recent years [1]

  • While cost of sequencing individual samples is rapidly decreasing, it remains – and likely will remain for the immediate future – more cost efficient to identify all genetic variants through sequencing of DNA pools, followed by individual genotyping of the set of variants that are most associated with case/control status

  • While sequencing DNA pools presents challenges in accurately detecting rare variants with high sensitivity [2,3,4], this is relatively unimportant when following up an observed association with a common variant as it is unlikely that phenotypic associations with common variants are driven by single or multiple rare causal variants [5,6]

Read more

Summary

Introduction

A large number of genetic associations with disease have been discovered in recent years [1]. While cost of sequencing individual samples is rapidly decreasing, it remains – and likely will remain for the immediate future – more cost efficient to identify all genetic variants through sequencing of DNA pools, followed by individual genotyping of the set of variants that are most associated with case/control status. Testing for association in DNA sequence from pools of cases and controls without correcting for the underlying sources of variation will result in a large inflation of the distribution of the test statistic relative to the null distribution [7], and less obviously may result in the incorrect ranking of SNPs for followup. While methodology has been developed for case-control association analysis from DNA sequencing of pools, the majority have focused on the detection of association with rare variants [8,9], while those for common variants provide complex and computationally intensive models [10,11]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.