Abstract
Genome-wide association studies (GWAS) are widely applied to identify susceptibility loci for a variety of diseases using genotyping arrays that interrogate known polymorphisms throughout the genome. A particular strength of GWAS is that it is unbiased with respect to specific genomic elements (e.g., coding or regulatory regions of genes), and it has revealed important associations that would have never been suspected based on prior knowledge or assumptions. To date, the discovered SNPs associated with complex human traits tend to have small effect sizes, requiring very large sample sizes to achieve robust statistical power. To address these issues, a number of efficient strategies have emerged for conducting GWAS, including combining study results across multiple studies using meta-analysis, collecting cases through electronic health records, and using samples collected from other studies as controls that have already been genotyped and made publicly available (e.g., through deposition of de-identified data into dbGaP or EGA). In certain scenarios, it may be attractive to use already genotyped controls and divert resources to standardized collection, phenotyping, and genotyping of cases only. This strategy, however, requires that careful attention be paid to the choice of “public controls” and to the comparability of genetic data between cases and the public controls to ensure that any allele frequency differences observed between groups is attributable to locus-specific effects rather than to a systematic bias due to poor matching (population stratification) or differential genotype calling (batch effects). The goal of this paper is to describe some of the potential pitfalls in using previously genotyped control data. We focus on considerations related to the choice of control groups, the use of different genotyping platforms, and approaches to deal with population stratification when cases and controls are genotyped across different platforms.
Highlights
Genome-wide association studies (GWAS) have been widely used in recent years as a tool for identifying susceptibility loci for a number of complex human traits and, in particular, multifactorial diseases
SUMMARY GWAS have been undeniably successful in identifying novel disease susceptibility loci (e.g., Billings and Florez, 2010; Teslovich et al, 2010; Chasman et al, 2012)
GWAS have made clear that very large sample sizes are required to detect trait-associated SNPs that have small effect sizes
Summary
Genome-wide association studies (GWAS) have been widely used in recent years as a tool for identifying susceptibility loci for a number of complex human traits and, in particular, multifactorial diseases. Genotyped controls in GWAS alternative approach to increase the statistical power of a study at no extra cost is to devote available clinical and genotyping resources almost entirely to cases and use publicly available data from already genotyped samples as controls. Using available controls may be attractive for registry-based studies from which a large number of cases can be rapidly identified. While this strategy has the obvious benefit of allocating scarce resources toward genotyping a larger number of cases, it can introduce potential bias into the experimental design leading to spurious associations if not applied carefully. Even minor differences in genotype calling, possibly attributable to a laboratory or technician bias, may translate into subtle but systematic differences in allele frequencies between cases and controls that can result in false positive associations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.