Abstract

Incomplete lineage sorting (ILS), modelled by the multi-species coalescent, is a process that results in a gene tree being different from the species tree. Because ILS is expected to occur for at least some loci within genome-scale analyses, the evaluation of species tree estimation methods in the presence of ILS is of great interest. Performance on simulated and biological data have suggested that concatenation analyses can result in the wrong tree with high support under some conditions, and a recent theoretical result by Roch and Steel proved that concatenation using unpartitioned maximum likelihood analysis can be statistically inconsistent in the presence of ILS. In this study, we survey the major species tree estimation methods, including the newly proposed “statistical binning” methods, and discuss their theoretical properties. We also note that there are two interpretations of the term “statistical consistency”, and discuss the theoretical results proven under both interpretations.

Highlights

  • Estimating species trees from multiple loci is commonly performed using concatenation methods, in which multiple sequence alignments from different genomic regions are concatenated into one large supermatrix, and a tree is estimated on the supermatrix

  • As proven in 10, pipelines based on weighted statistical binning followed by summary methods such as MP-EST or ASTRAL are statistically consistent using the first definition, which allows the number of sites per locus as well as the number of loci to increase

  • We can definitively answer this question with respect to the first meaning of statistical consistency: as shown in[10], phylogenomic pipelines that use weighted statistical binning followed by coalescent-based summary methods such as MP-EST will converge in probability to the true species tree as the number of loci and sites per locus both increase

Read more

Summary

Introduction

Estimating species trees from multiple loci is commonly performed using concatenation methods, in which multiple sequence alignments from different genomic regions are concatenated into one large supermatrix, and a tree is estimated on the supermatrix. A few coalescent-based species tree estimation methods have been proven to be statistically consistent in the second (i.e., stronger) sense of the term, which establishes that the species tree estimated by the method converges to the true species tree as the number of loci is allowed to increase, even when the sequence length per locus is bounded[7,16,18,19].

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.