Abstract

Multilocus sequence data provide far greater power to resolve species limits than the single locus data typically used for broad surveys of clades. However, current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations. New methods are therefore needed to open up the power of multilocus approaches to larger systematic surveys. Here, we present a rapid and scalable method that introduces 2 new innovations. First, the method reduces the complexity of likelihood calculations by decomposing the tree into rooted triplets. The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent. A Bayesian model comparison framework was developed and the best delimitation found by comparing the product of posterior probabilities of all triplets. The second innovation is a new dynamic programming algorithm for finding the optimum delimitation from all those compatible with a guide tree by successively analyzing subtrees defined by each node. This algorithm removes the need for heuristic searches used by current methods, and guarantees that the best solution is found and potentially could be used in other systematic applications. We assessed the performance of the method with simulated, published, and newly generated data. Analyses of simulated data demonstrate that the combined method has favorable statistical properties and scalability with increasing sample sizes. Analyses of empirical data from both eukaryotes and prokaryotes demonstrate its potential for delimiting species in real cases.

Highlights

  • Species constitute the basic taxonomic unit for exchanging information about biological diversity

  • Current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations

  • The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent

Read more

Summary

Introduction

Species constitute the basic taxonomic unit for exchanging information about biological diversity. DNA-based delimitation provides a universal method to detect the signature of species existence applicable to various organisms. Methods to delimit species from DNA sequences alone have been actively developed over the last decade. For early applications of DNA-based delimitation, available markers were limited to a handful of barcoding loci customized for each type of organism (such as cox for animals, Hebert et al 2003), and delimitation methods were designed to handle these single locus sequences (Pons et al 2006; Puillandre et al 2012; Fujisawa and Barraclough 2013; Zhang et al 2013). As the cost of sequencing large amounts of DNA has dramatically decreased, and the ease of developing nuclear markers from genome data has increased, the focus has naturally shifted from single to multiple locus approaches

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call