Abstract

BackgroundSpecies tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses.ResultsWe present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci.ConclusionsPhylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.

Highlights

  • Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size

  • We address the challenge of using *BEAST and other Bayesian coalescent-based methods for co-estimating species trees and gene trees. These methods are statistically consistent under the multi-species coalescent model [6], which means that as the number of genes and their sequence lengths both increase, the probability that the method will return the true species tree will increase to 1. While these Bayesian methods have excellent accuracy in simulations and on biological datasets [7,8,9], they use computationally intensive MCMC approaches that in practice limit them to relatively small numbers of loci; for example, *BEAST did not converge on 100-gene simulated datasets with 11 taxa within 150 hours [9], and analyses on biological datasets can take weeks [10]

  • We compare the topological error of species trees computed using three methods: BBCA, *BEAST, and concatenation using maximum likelihood (CA-ML) (Figure 1)

Read more

Summary

Results

We present results for these analyses here, but see Additional file 1 for additional details. The ESS values suggest that *BEAST was much closer to converging when run in the BBCA analysis than when run on the full set of 100 genes, even when *BEAST was allowed to run for 96 hours (Figure 2). The ESS values suggest that both analyses had reached comparable levels of convergence, the difference in species tree accuracy may reflect a failure of *BEAST to converge sufficiently on the 50-gene bins. We examined the impact of allowing *BEAST to run for 48 hours instead of 24 hours on each 25-gene bin on the Laurasiatheria simulated datasets. ESS values improved for *BEAST on these bins when run for 48 hours, but there was no change in the resultant species tree estimation for any sequence length condition, whether we ran *BEAST for 24 hours or 48 hours for each bin. MP-EST always completed in less than 2 minutes

Conclusions
Background
Discussion
Maddison W
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.