Abstract

We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.

Highlights

  • We address some statistical issues raised by the discovery of new genetic variants in the context of case-control association studies

  • Sequencing cases for variant discovery Resequencing a subset of individuals can be used to expand the catalogue of variants that less expensive genotyping methods will recognize in the remaining individuals

  • We focus primarily on a simple case-control setting, and a simple test of the collective effect of uncommon or rare alleles

Read more

Summary

Introduction

We address some statistical issues raised by the discovery of new genetic variants in the context of case-control association studies. Genomewide association (GWA) studies are generally based on established sets of single-nucleotide polymorphisms (SNPs). Even with genome-wide resequencing of expressed genes, an investigator may need a locus-focused effort to discover variation in regulatory regions, or there may me a need to probe for the newly-discovered variants in a larger set of individuals. Any of these situations is likely to raise the issues we addess

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.