Abstract

Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10−8 and 8 × 10−8 for our analytic choices in window-based testing, and thresholds of 0.6 × 10−8–1.5 × 10−8 for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.

Highlights

  • Complex-trait genetic association studies aim to identify robust associations between genotype and phenotype in order to enhance our understanding of the underlying biological processes contributing to the trait of interest

  • We undertook a complementary approach, using the whole-genome sequencing (WGS) data from chromosome 3 in the UK10K project, where phenotypes were simulated under the null thresholds and three test statistics

  • We have presented an empirical approach for estimating genome-wide significance thresholds for the analysis of sequencing data and rare genetic variation

Read more

Summary

Introduction

Complex-trait genetic association studies aim to identify robust associations between genotype and phenotype in order to enhance our understanding of the underlying biological processes contributing to the trait of interest. The field of complex-trait genetics has far focused on the study of common (minor allele frequency [MAF] ≥ 0.05) variants through candidate gene studies and, in recent years, through genome-wide association scans (GWAS). The candidate gene study era was unsuccessful in identifying many reproducible associations, partly due to the liberal thresholds used to declare statistical significance, and the issue of multiple testing became even more pronounced with the advent of GWAS. Following advances in large-scale genotyping and next-generation sequencing technologies, low-frequency (0.01 < MAF < 0.05) and rare (MAF < 0.01) variants are increasingly becoming the the focus of genetic association studies, as they are hypothesized to have larger effect sizes, more readily interpretable functions and possible translational potential. Whole-exome sequencing (WES) and whole-genome sequencing (WGS) studies identify almost all sequence variation in the targeted genomic regions, and the variants identified are often extremely rare or unique; the number of variants observed increases with sample size

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.