Abstract

Population genetic studies provide insights into the evolutionary processes that influence the distribution of sequence variants within and among wild populations. FST is among the most widely used measures for genetic differentiation and plays a central role in ecological and evolutionary genetic studies. It is commonly thought that large sample sizes are required in order to precisely infer FST and that small sample sizes lead to overestimation of genetic differentiation. Until recently, studies in ecological model organisms incorporated a limited number of genetic markers, but since the emergence of next generation sequencing, the panel size of genetic markers available even in non-reference organisms has rapidly increased. In this study we examine whether a large number of genetic markers can substitute for small sample sizes when estimating FST. We tested the behavior of three different estimators that infer FST and that are commonly used in population genetic studies. By simulating populations, we assessed the effects of sample size and the number of markers on the various estimates of genetic differentiation. Furthermore, we tested the effect of ascertainment bias on these estimates. We show that the population sample size can be significantly reduced (as small as n = 4–6) when using an appropriate estimator and a large number of bi-allelic genetic markers (k>1,000). Therefore, conservation genetic studies can now obtain almost the same statistical power as studies performed on model organisms using markers developed with next-generation sequencing.

Highlights

  • Studies on wild populations give important insights into population dynamics leading to genetic differentiation

  • The statistical behavior of different FST estimators has been analyzed before, this study is the first to evaluate different FST estimators in population genetic studies of thousands of loci and very small sample sizes. This is a timely matter, given that Generation Sequencing (NGS) methods have revolutionized the field of marker development

  • The original FSTW estimator severely overestimates the level of genetic differentiation when using small sample sizes

Read more

Summary

Introduction

Studies on wild populations give important insights into population dynamics leading to genetic differentiation. SNP assays are often designed using small panels incorporating only a fraction of populations and individuals that are later genotyped for these SNPs. common polymorphisms are more likely detected than rare variants skewing minor allele frequencies (MAF) to higher values [2]. New methods incorporating generation sequencing make it possible to develop thousands of SNP assays with less bias and at a fraction of previous costs, in non-reference organisms [6]. The question arises whether the large increase in the number of available genetic markers reduces the required sample sizes in order to get reliable estimates of FST. Reducing the sample size per population would make it possible to analyze a larger number of different populations at the same cost, and it offers an important advantage in conservation genetic studies on rare organisms. Understanding the statistics of different FST estimators is especially important in the context of detecting regions under selection

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call