Abstract

Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure.Key words: phylogenetics, evolution, tree metrics, genetics, sequencing.

Highlights

  • A fundamental challenge in the study of evolution is that for a given set of organisms, markedly different phylogenetic trees can be inferred from each combination of input data, software, and settings (Sullivan et al 1996; Rokas et al 2003; Jiang et al 2014)

  • Reasons for this include lack of informative data, differences between tree inference methods, conflicting signals from descent and selection, and the fact that evolution is not always tree-like: Gene trees differ from species trees, and many organisms exchange genes horizontally

  • Quantitative, metric-based tree comparisons are an alternative to visual methods, but they currently suffer from drawbacks including counterintuitive behavior (Kuhner and Yamato 2014) and poor resolution (Hillis et al 2005)

Read more

Summary

Introduction

A fundamental challenge in the study of evolution is that for a given set of organisms, markedly different phylogenetic trees can be inferred from each combination of input data, software, and settings (Sullivan et al 1996; Rokas et al 2003; Jiang et al 2014). Phylogenetic uncertainty is often apparent following Bayesian Markov Chain Monte Carlo (MCMC) inference of trees from data (e.g., BEAST [Drummond et al 2012] and MrBayes [Huelsenbeck and Ronquist 2001]) These tools produce large posterior collections of trees which can include considerable variety and are hard to summarize. In the widely used Robinson–Foulds (RF) unweighted metric (Robinson and Foulds 1981), known as the “symmetric difference,” many pairs of trees are the same distance apart, and large distances between trees do not imply large differences among the shared ancestry of most tips (Steel and Penny 1993) These limitations hamper the examination of Bayesian posterior collections of trees, so posterior distributions are typically summarized with a single maximum clade credibility (MCC) tree together with edge support values that describe the location and extent of uncertainty. How that uncertainty arises from the ancestral patterns in the data is not revealed; using a single summary tree carries the drawback that crucial information can be lost (Heled and Bouckaert 2013)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call