Abstract

Replicable genetic association signals have consistently been found through genome-wide association studies in recent years. The recent dramatic expansion of study sizes improves power of estimation of effect sizes, genomic prediction, causal inference, and polygenic selection, but it simultaneously increases susceptibility of these methods to bias due to subtle population structure. Standard methods using genetic principal components to correct for structure might not always be appropriate and we use a simulation study to illustrate when correction might be ineffective for avoiding biases. New methods such as trans-ethnic modeling and chromosome painting allow for a richer understanding of the relationship between traits and population structure. We illustrate the arguments using real examples (stroke and educational attainment) and provide a more nuanced understanding of population structure, which is set to be revisited as a critical aspect of future analyses in genetic epidemiology. We also make simple recommendations for how problems can be avoided in the future. Our results have particular importance for the implementation of GWAS meta-analysis, for prediction of traits, and for causal inference.

Highlights

  • Is population structure relevant in genetic epidemiology?It could be taken for granted that the problem of population structure, in genetic epidemiology, is “solved”

  • Despite early concerns that phenotypes may be stratified by population (Cardon and Palmer 2003; Freedman et al 2004; Klein et al 2005; Marchini et al 2004) replication rates have been high since the arrival of the genome-wide association study (GWAS) (Pe’er et al 2008) and consequent adoption of stringent genome-wide significance levels

  • Population structure has always been a feature of genetic studies of phenotypic variation

Read more

Summary

Introduction

It could be taken for granted that the problem of population structure (see “Box 1”), in genetic epidemiology, is “solved”. The model correcting for ancestry would be preferred for prediction only if (a) it contained enough predictive power to capture real phenotypic differences, and (b) the use case involved generalization into populations for which ancestry may have different effects; for example, predicting skin cancer would be concerning if the predicted population’s skin tone fell outside the range of study population or was caused by different underlying SNPs. Genetic “prediction ... In ALSPAC, genetic ancestry can predict 8% of the variation in education; for comparison, the most recent published whole-genome genetic score explains 3.2% (Okbay et al 2016), and a mega-scale analysis is expected to generate a genetic score explaining 10% of the variance (Martin 2018) These results are based on meta-analyses of many studies, in which PC correction may not have sufficiently controlled for population structure. New methodology should be able to exploit differences across populations to automatically screen SNPs and create causal graphs unique to each population

Discussion
Externally adjusted estimates
Methods
Findings
Compliance with ethical standards
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.