Abstract

Population stratification continues to bias the results of genome-wide association studies (GWAS). When these results are used to construct polygenic scores, even subtle biases can cumulatively lead to large errors. To study the effect of residual stratification, we simulated GWAS under realistic models of demographic history. We show that when population structure is recent, it cannot be corrected using principal components of common variants because they are uninformative about recent history. Consequently, polygenic scores are biased in that they recapitulate environmental structure. Principal components calculated from rare variants or identity-by-descent segments can correct this stratification for some types of environmental effects. While family-based studies are immune to stratification, the hybrid approach of ascertaining variants in GWAS but reestimating effect sizes in siblings reduces but does not eliminate stratification. We show that the effect of population stratification depends not only on allele frequencies and environmental structure but also on demographic history.

Highlights

  • Population structure refers to patterns of genetic variation that arise due to non-random mating

  • If these patterns are correlated with environmental factors, they can lead to spurious associations and biased effect size estimates in genome-wide association studies (GWAS)

  • We set the migration rates in the two models to match the degree of population structure in the UK Biobank, measured by the average FST between regions (Leslie et al, 2015) and the genomic inflation factor for a GWAS of birthplace in individuals with ‘White British’ ancestry from the UK Biobank (Haworth et al, 2019)

Read more

Summary

Introduction

Population structure refers to patterns of genetic variation that arise due to non-random mating If these patterns are correlated with environmental factors, they can lead to spurious associations and biased effect size estimates in genome-wide association studies (GWAS). Approaches such as genomic control (GC) (Devlin and Roeder, 1999), principal component analysis (PCA) (Price et al, 2006), linear mixed models (LMMs) (Kang et al, 2010; Loh et al, 2015) and linkage disequilibrium score regression (LDSC) (Bulik-Sullivan et al, 2015a) have been developed to detect and correct for this stratification. Some of this variation may be attributed to recent migration patterns (Abdellaoui et al, 2019), it could reflect residual stratification in effect size estimates (Lawson et al, 2020)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.