Using Constrained-INC for Large-Scale Gene Tree and Species Tree Estimation.

Thien Le,Tandy Warnow,Aaron Sy,Erin K Molloy,Satish Rao,Qiuyi Zhang

doi:10.1109/tcbb.2020.2990867

Abstract

Incremental tree building (INC) is a new phylogeny estimation method that has been proven to be absolute fast converging under standard sequence evolution models. A variant of INC, called Constrained-INC, is designed for use in divide-and-conquer pipelines for phylogeny estimation where a set of species is divided into disjoint subsets, trees are computed on the subsets using a selected base method, and then the subset trees are combined together. We evaluate the accuracy of INC and Constrained-INC for gene tree and species tree estimation on simulated datasets, and compare it to similar pipelines using NJMerge (another method that merges disjoint trees). For gene tree estimation, we find that INC has very poor accuracy in comparison to standard methods, and even Constrained-INC(using maximum likelihood methods to compute constraint trees) does not match the accuracy of the better maximum likelihood methods. Results for species trees are somewhat different, with Constrained-INC coming close to the accuracy of the best species tree estimation methods, while being much faster; furthermore, using Constrained-INC allows species tree estimation methods to scale to large datasets within limited computational resources. Overall, this study exposes the benefits and limitations of divide-and-conquer strategies for large-scale phylogenetic tree estimation.

Highlights

THE estimation of gene trees and species trees is a basic part of many biological analysis pipelines; gene trees have implications for trait evolution and the prediction of protein function and structure, while species trees are needed to understand how species adapt to their environments, to date speciation events, etc
We examine the impact of Constrained-Incremental tree building (INC) for use in species tree estimation from multi-locus datasets, where gene trees can differ from the species tree due to incomplete lineage sorting (ILS)
Under low/moderate ILS conditions, both NJMerge and Constrained-INC were similar in accuracy to RAxML, but both were much faster than RAxML

Summary

Introduction

THE estimation of gene trees and species trees is a basic part of many biological analysis pipelines; gene trees have implications for trait evolution and the prediction of protein function and structure (as well as other applications), while species trees are needed to understand how species adapt to their environments, to date speciation events, etc. The estimation of both gene trees and species trees are based on statistical models of evolution, with gene trees based on a single locus within the genome of the different species, and species trees based on multiple loci.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics	Publication Date: Jan 1, 2021
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using Constrained-INC for Large-Scale Gene Tree and Species Tree Estimation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics

Lead the way for us

Similar Papers

To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.
Erin K Molloy ... Tandy Warnow
Systematic Biology | VOL. 67
Erin K Molloy, et. al.Erin K Molloy ... Tandy Warnow
15 Sep 2017
Systematic Biology | VOL. 67

The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life
Yan Du ... Shaoyuan Wu
BMC Evolutionary Biology | VOL. 19
Yan Du, et. al.Yan Du ... Shaoyuan Wu
06 Nov 2019
BMC Evolutionary Biology | VOL. 19

Sources of Error Inherent in Species-Tree Estimation: Impact of Mutational and Coalescent Effects on Accuracy and Implications for Choosing among Different Methods
Huateng Huang ... Qixin He
Systematic Biology | VOL. 59
Huateng Huang, et. al.Huateng Huang ... Qixin He
10 Sep 2010
Systematic Biology | VOL. 59

The performance of coalescent-based species tree estimation methods under models of missing data
Michael Nute ... Tandy Warnow
BMC Genomics | VOL. 19
Michael Nute, et. al.Michael Nute ... Tandy Warnow
01 May 2018
BMC Genomics | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Constrained-INC for Large-Scale Gene Tree and Species Tree Estimation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics