Abstract

Inferring population genetic structure from large-scale genotyping of single-nucleotide polymorphisms or variants is an important technique for studying the history and distribution of extant human populations, but it is also a very important tool for adjusting tests of association. However, the structures inferred depend on the minor allele frequency of the variants; this is very important when considering the phenotypic association of rare variants.Using the Genetic Analysis Workshop 18 data set for 142 unrelated individuals, which includes genotypes for many rare variants, we study the following hypothesis: the difference in detected structure is the result of a "scale" effect; that is, rare variants are likely to be shared only locally (smaller scale), while common variants can be spread over longer distances. The result is similar to that of using kernel principal component analysis, as the bandwidth of the kernel is changed. We show how different structures become evident as we consider rare or common variants.

Highlights

  • Inferring population genetic structure from large-scale genotyping of single-nucleotide polymorphisms (SNPs) or variants (SNVs), often performed using principal component analysis (PCA) [1] or model-based clustering [2], is an important technique for studying the history and distribution of extant human populations [3], but it is a very important tool for adjusting tests of association [1,4].Thanks to the increasing availability of sequencing technology, it is possible to identify very rare variants and to type them on large samples of individuals, extending the reach of the genome-wide association study design

  • Methods for detecting population structure and for adjusting association tests should take into account the fact that the population structures inferred depend on the minor allele frequency (MAF) of the SNVs; this is very important when considering the phenotypic association of rare variants [5]

  • Using the Genetic Analysis Workshop 18 (GAW18) data set for 142 unrelated individuals, which includes genotypes for many rare variants, we show how different structures become evident as we consider rare or common variants and how these structures transform smoothly as we change the window of allowed MAF values

Read more

Summary

Introduction

Inferring population genetic structure from large-scale genotyping of single-nucleotide polymorphisms (SNPs) or variants (SNVs), often performed using principal component analysis (PCA) [1] or model-based clustering [2], is an important technique for studying the history and distribution of extant human populations [3], but it is a very important tool for adjusting tests of association [1,4].Thanks to the increasing availability of sequencing technology, it is possible to identify very rare variants and to type them on large samples of individuals, extending the reach of the genome-wide association study design. Inferring population genetic structure from large-scale genotyping of single-nucleotide polymorphisms (SNPs) or variants (SNVs), often performed using principal component analysis (PCA) [1] or model-based clustering [2], is an important technique for studying the history and distribution of extant human populations [3], but it is a very important tool for adjusting tests of association [1,4]. The result is similar to that of using kernel principal component analysis (KPCA) [6] because the bandwidth (ie, scale) of the kernel is changed (De la Cruz and Susan Holmes, work in preparation) This similarity between the behavior of PCA at different MAF levels and KPCA at different scales is further evidence, albeit circumstantial, of the connection between MAF levels and scale

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call