Abstract

MotivationEstimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However, gene trees and species trees can differ as a result of gene duplication and loss (GDL), incomplete lineage sorting (ILS), and other biological processes. Thus gene tree correction methods need to take estimation error as well as gene tree heterogeneity into account. Many prior gene tree correction methods have been developed for the case where GDL is present.ResultsHere, we study the problem of gene tree correction where gene tree heterogeneity is instead due to ILS and/or HGT. We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-optimal tree refinement and completion (RF-OTRC) Problem, which seeks a refinement and completion of a singly-labeled gene tree with respect to a given singly-labeled species tree so as to minimize the Robinson−Foulds (RF) distance. Our extensive simulation study on 68,000 estimated gene trees shows that TRACTION matches or improves on the accuracy of well-established methods from the GDL literature when HGT and ILS are both present, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. We also show that a naive generalization of the RF-OTRC problem to multi-labeled trees is possible, but can produce misleading results where gene tree heterogeneity is due to GDL.

Highlights

  • Reconstructing the evolutionary history of a gene is a core task in phylogenetics, and our ability to infer these evolutionary relationships accurately can have important implications for a variety of downstream analyses

  • Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS) and/or horizontal gene transfer (HGT)

  • We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-optimal tree refinement and completion (RF-OTRC) Problem, which seeks a refinement and completion of a singlylabeled gene tree with respect to a given singly-labeled species tree so as to minimize the Robinson−Foulds (RF) distance

Read more

Summary

Introduction

Reconstructing the evolutionary history of a gene is a core task in phylogenetics, and our ability to infer these evolutionary relationships accurately can have important implications for a variety of downstream analyses. Estimated gene trees are used in the inference of adaptation, evolutionary event detection (such as gene loss, gene duplication, and horizontal gene transfer), ortholog identification, analysis of functional trait evolution, and species tree estimation. Many phylogenomic datasets have gene trees with average branch support well below 75%, which is a common lower bound for branches to be considered reliable. Estimating gene and species trees is further complicated by biological processes such as gene duplication/loss (GDL), incomplete lineage sorting (ILS), and horizontal gene transfer (HGT), that create heterogeneous tree topologies across the genome [4]. HGT has Christensen et al Algorithms Mol Biol (2020) 15:1 long been known to cause problems for bacterial phylogenetics, and ILS by itself has emerged as a major issue in phylogenomics, affecting most, if not all, genome-scale datasets [5]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.