Abstract
Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Most empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that arise when displaying branch values on trees after rerooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when rerooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed ten tree viewers and ten bioinformatics toolkits that can display and reroot trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements in eight tools. We suggest tools should provide options that explicitly force users to define the semantics of node labels.
Highlights
We examine different popular tree viewers and several bioinformatics toolkits to determine if they maintain the correct branch value mapping when re-rooting our test tree TN at the branch leading to tip node X
Our results indicate that an explicit convention and explicit semantics for interpreting node and branch values in tree viewers and other common bioinformatics tools are clearly missing
Only three (Archaeopteryx, ETE, and Dendroscope from v. 3.5.0 onwards) offer a user dialog to define the semantics of node labels
Summary
The Newick format is widely used to store and visualize phylogenies. Archie et al introduced it in 1986 [1]. The mapping between branch values and node labels in Newick files is well defined in principle: For restoring the correct association between node labels and branches, the direction towards the toplevel node (root or top-level trifurcation) can be used. A tree inferred with a time-asymmetric model might contain posterior support values that belong to nodes rather than branches/splits of the tree As another example, inner node labels that represent clade names (e.g., “Mammalia”) are attributes associated with one direction of a branch (only mammals in one part of the split induced by the branch, none in the other). We focus on the distinction between node labels and branch values here, and use re-rooting to reveal the internal workings of the tested tools
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.