Abstract
BackgroundInferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed.ResultsWe introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves.ConclusionsWe show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3098-z) contains supplementary material, which is available to authorized users.
Highlights
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed
When consensus is used within DISTIQUE, the accuracy improves with decreased Incomplete Lineage Sorting (ILS), as expected (Additional file 1: Figures S1 and S2)
Hereafter, we only show results for DISTIQUE applied to a majority consensus, and we omit all-pairs-max
Summary
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. A desirable property for a summary method is statistical consistency (a theoretical guarantee that it converges in probability to the correct species tree as the number of error-free genes increases). Many statistically consistent summary methods are available (e.g., ASTRAL [3, 4], BUCKy-population [5], and MPEST [6]), and coalescent-based species tree estimation is a vibrant field of research, with many recent examples of successful biological analyses [7,8,9] (see [10,11,12,13,14] for criticism of these methods, especially their sensitivity to gene tree error)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.