Abstract

BackgroundNew polymorphism datasets from heterochroneous data have arisen thanks to recent advances in experimental and microbial molecular evolution, and the sequencing of ancient DNA (aDNA). However, classical tools for population genetics analyses do not take into account heterochrony between subsets, despite potential bias on neutrality and population structure tests. Here, we characterize the extent of such possible biases using serial coalescent simulations.Methodology/Principal FindingsWe first use a coalescent framework to generate datasets assuming no or different levels of heterochrony and contrast most classical population genetic statistics. We show that even weak levels of heterochrony (∼10% of the average depth of a standard population tree) affect the distribution of polymorphism substantially, leading to overestimate the level of polymorphism θ, to star like trees, with an excess of rare mutations and a deficit of linkage disequilibrium, which are the hallmark of e.g. population expansion (possibly after a drastic bottleneck). Substantial departures of the tests are detected in the opposite direction for more heterochroneous and equilibrated datasets, with balanced trees mimicking in particular population contraction, balancing selection, and population differentiation. We therefore introduce simple corrections to classical estimators of polymorphism and of the genetic distance between populations, in order to remove heterochrony-driven bias. Finally, we show that these effects do occur on real aDNA datasets, taking advantage of the currently available sequence data for Cave Bears (Ursus spelaeus), for which large mtDNA haplotypes have been reported over a substantial time period (22–130 thousand years ago (KYA)).Conclusions/SignificanceConsidering serial sampling changed the conclusion of several tests, indicating that neglecting heterochrony could provide significant support for false past history of populations and inappropriate conservation decisions. We therefore argue for systematically considering heterochroneous models when analyzing heterochroneous samples covering a large time scale.

Highlights

  • Most present population genetics analyses rely on coalescent theory, representing the genetic history of a random set of gene copies with genealogical trees where nodes represent coalescent events, that is when two evolutionary lines of descent reach a common ancestor [1] (Figure 1A; see Table 1 for a summary of notations)

  • This sampling theory allows an efficient treatment of data and overall good predictions for the outcome of evolution on a set of gene copies in population(s) under specific demographic and migration scenarios. It is most used within the framework of the classical population genetics Wright-Fisher model (WF) [2] and one of several implicit assumptions is that all individuals are sampled at the same time

  • This is reasonable for most datasets sampled on extant species since (1) polymorphism arises from mutations occurring along the total size of genealogies and (2) the number of generations covered across the sample is low with regard to the total depth of the genealogy that lasts on average 2 Ne generations for mitochondrial DNA with Ne, the effective size of the population considered, assumed to be large in the coalescent framework

Read more

Summary

Introduction

Most present population genetics analyses rely on coalescent theory, representing the genetic history of a random set of gene copies with genealogical trees where nodes represent coalescent events, that is when two evolutionary lines of descent reach a common ancestor [1] (Figure 1A; see Table 1 for a summary of notations) This sampling theory allows an efficient treatment of data and overall good predictions for the outcome of evolution on a set of gene copies in population(s) under specific demographic and migration scenarios. It is most used within the framework of the classical population genetics Wright-Fisher model (WF) [2] (hereafter ‘‘standard’’) and one of several implicit assumptions is that all individuals are sampled at the same time (hereafter ‘‘contemporaneous’’). We characterize the extent of such possible biases using serial coalescent simulations

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.