Abstract

STRUCTURE remains the most applied software aimed at recovering the true, but unknown, population structure from microsatellite or other genetic markers. About 30% of structure‐based studies could not be reproduced (Molecular Ecology, 21, 2012, 4925). Here we use a large set of data from 2,323 horses from 93 domestic breeds plus the Przewalski horse, typed at 15 microsatellites, to evaluate how program settings impact the estimation of the optimal number of population clusters K opt that best describe the observed data. Domestic horses are suited as a test case as there is extensive background knowledge on the history of many breeds and extensive phylogenetic analyses. Different methods based on different genetic assumptions and statistical procedures (dapc, flock, PCoA, and structure with different run scenarios) all revealed general, broad‐scale breed relationships that largely reflect known breed histories but diverged how they characterized small‐scale patterns. structure failed to consistently identify K opt using the most widespread approach, the ΔK method, despite very large numbers of MCMC iterations (3,000,000) and replicates (100). The interpretation of breed structure over increasing numbers of K, without assuming a K opt, was consistent with known breed histories. The over‐reliance on K opt should be replaced by a qualitative description of clustering over increasing K, which is scientifically more honest and has the advantage of being much faster and less computer intensive as lower numbers of MCMC iterations and repetitions suffice for stable results. Very large data sets are highly challenging for cluster analyses, especially when populations with complex genetic histories are investigated.

Highlights

  • Molecular ecology and conservation biology heavily rely on the identification of population structure and genetic admixture between individuals and populations

  • Bayesian statistics utilizing Markov Chain Monte Carlo, MCMC, simulations has been implemented in various computer programs such as BAPS (Corander, Marttinen, Sirén, & Tang, 2008) and STRUCTURE (Pritchard et al, 2000)

  • Using large-scale screening of domestic horse breeds alongside the Przewalski horse, we aimed to evaluate the robustness of STRUCTURE and ΔK with emphasis on the effects of the numbers of MCMC iterations and replicates

Read more

Summary

| INTRODUCTION

Molecular ecology and conservation biology heavily rely on the identification of population structure and genetic admixture between individuals and populations. Underpinning factors include the inherent stochastic nature of the model-fitting procedure, nonconvergence of parameter estimates due to inappropriate number of MCMC iterations, inappropriate number of replicate runs R, weak population structure, too few informative microsatellite loci, and the estimation procedure per se (Gilbert et al, 2012; Putman & Carbone, 2014). STRUCTURE works best with relatively small numbers of demes or populations (Pritchard et al, 2000) but the performance with large data sets remains unevaluated. Two large microsatellite studies investigated 67 and 41 breeds, respectively, and constructed distance-based phylogenies using a variety of phylogenetic methods (Conant, Juras, & Cothran, 2012; Pires et al, 2016) These approaches revealed patterns largely reflecting known breed histories, but statistical support was consistently low (Cothran & Luís, 2005). Discriminant analysis of principal components, DAPC, as implemented in the software ADEGENET (Jombart, 2008; Jombart, Devillard, & Balloux, 2010) is a model-free multivariate method

| MATERIALS AND METHODS
Findings
| DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.