Abstract

Single nucleotide polymorphism (SNP) data are widely used in research on natural populations. Although they are useful, SNP genotyping data are known to contain bias, normally referred to as ascertainment bias, because they are conditioned by already confirmed variants. This bias is introduced during the genotyping process, including the selection of populations for novel SNP discovery and the number of individuals involved in the discovery panel and selection of SNP markers. It is widely recognized that ascertainment bias can cause inaccurate inferences in population genetics and several methods to address these bias issues have been proposed. However, especially in natural populations, it is not always possible to apply an ideal ascertainment scheme because natural populations tend to have complex structures and histories. In addition, it was not fully assessed if ascertainment bias has the same effect on different types of population structure. Here, we examine the effects of bias produced during the selection of population for SNP discovery and consequent SNP marker selection processes under three demographic models: the island, stepping-stone, and population split models. Results show that site frequency spectra and summary statistics contain biases that depend on the joint effect of population structure and ascertainment schemes. Additionally, population structure inferences are also affected by ascertainment bias. Based on these results, it is recommended to evaluate the validity of the ascertainment strategy prior to the actual typing process because the direction and extent of ascertainment bias vary depending on several factors.

Highlights

  • Recent developments in genotyping technology have made it possible to use vast amounts of genetic information, and have drawn increasing attention to the usefulness of Single nucleotide polymorphism (SNP) in ecology, evolution, and medical sciences (Brumfield et al 2003; The International HapMap Consortium 2003; Manolio et al 2008; Brito and Edwards 2009; Ng et al 2009)

  • Because SNPs found in a discovery panel represents only a fraction of the variable sites in the original population, the genotype data contain information that is to a degree ‘distorted’, as they depend on the specific original population and the number of individuals considered in the discovery panel

  • This distortion indicates the presence of ascertainment bias, which depended on the sampling location of discovery panels and methods used for marker generation

Read more

Summary

Introduction

Recent developments in genotyping technology have made it possible to use vast amounts of genetic information, and have drawn increasing attention to the usefulness of SNPs in ecology, evolution, and medical sciences (Brumfield et al 2003; The International HapMap Consortium 2003; Manolio et al 2008; Brito and Edwards 2009; Ng et al 2009). Because SNPs found in a discovery panel represents only a fraction of the variable sites in the original population, the genotype data contain information that is to a degree ‘distorted’, as they depend on the specific original population and the number of individuals considered in the discovery panel. The existence of this ascertainment bias due to SNP selection is well known and has been investigated in previous studies (Rogers and Jorde 1996; Kuhner et al 2000; Nielsen 2000; Wakeley et al 2001; Akey et al 2003; Nielsen and Signorovitch 2003; Nielsen 2004; Nielsen et al 2004; Clark et al 2005; Lachance and Tishkoff 2013; Quinto-Cortes et al 2018). Since various summary statistics used in population genetics inferences (e.g., p, Tajima’s D, and so on) depend on the allele frequency spectrum, this bias can cause inaccurate estimations

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call