Abstract

Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals–307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150–200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs.

Highlights

  • The first Europeans from the Old World to land in what is US territory were Columbus’ men in 1493

  • The analysis described above suggests that both in the CORIELL and CHORI datasets, individuals of European American ancestry lie along a line, and all the variation is concentrated across the first eigenSNP, which corresponds to the first principal component

  • We have identified small sets of structure informative markers for the European American population through the direct investigation of European American samples and without depending on any assumptions about the ancestry or admixture proportions of the studied individuals

Read more

Summary

Introduction

The first Europeans from the Old World to land in what is US territory were Columbus’ men in 1493. The identification of population genetic structure has been discussed at length in recent literature, due to the potential bias it can introduce in association studies, searching for susceptibility genes for common complex disorders [2,3,4,5]. Population stratification is a source of confounding in case-control studies, when allelefrequency heterogeneity that is unrelated with the studied phenotype is coupled with disease-risk heterogeneity and biased sampling in cases and controls. European populations were initially considered genetically quite homogeneous, it has recently been shown that significant patterns of structure within Europe along a north to south axis do exist and that unidentified population stratification in European derived populations (European Americans) can lead to spurious associations with disease [5,6,7,8]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call