A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species.

Benjamin D Redelings,Mark T Holder

doi:10.7717/peerj.3058

Benjamin D Redelings, Mark T Holder

Open Access

https://doi.org/10.7717/peerj.3058

Copy DOI

Abstract

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.

Highlights

The Open Tree of Life project seeks to build a platform for summarizing what is known about phylogenetic relationships across all of Life (Hinchliff et al, 2015)
We have described the motivation and methodology used by our new supertree method that is currently used by the Open Tree of Life project to build summary supertrees on the scale of millions of leaves
Our new method represented 11% more input phylogeny splits with 51% less conflict compared to the Open Tree of Life version 4 summary tree, when applied to the same inputs

Summary

BACKGROUND

The Open Tree of Life project seeks to build a platform for summarizing what is known about phylogenetic relationships across all of Life (Hinchliff et al, 2015). MinCut deals with conflicts by modifying the BUILD algorithm to resolve incompatibilities by discarding edges that are present in the smallest number of input trees This approach violates our Goal #2 of resolving conflict via ranks that can be altered by a curator to influence the output tree. The ID may be associated with a set of flags that can indicate that the taxon may be questionable These flags can either encode information taken from an input taxonomy (for example, taxa the NCBI refers to as ‘‘unplaced’’ are assigned an ‘‘unplaced’’ flag) or can arise because of some form of conflict during taxonomy construction (for example, if two taxonomies disagree on the name for a taxon, the taxon can be merged and the name will be retained without any descendants; this name will have an OTT ID, but will be flagged as ‘‘barren’’). A terminal taxon that is represented only in the taxonomy can be pruned and regrafted onto the solution without affecting which nodes are displayed by the final summary tree. After producing the set of ‘‘exemplified’’ phylogenetic inputs, this tool exports a pruned down version of the taxonomy that only contains tips that are present in at least one phylogenetic input

Summary tree construction

RESULTS

CONCLUSIONS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Mar 1, 2017
Citations: 52	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Properties of Supertree Methods in the Consensus Setting
Mark Wilkinson ... Davide Pisani
Systematic Biology | VOL. 56
Mark Wilkinson, et. al.Mark Wilkinson ... Davide Pisani
01 Apr 2007
Systematic Biology | VOL. 56

Discriminating Supported and Unsupported Relationships in Supertrees Using Triplets
James A Cotton ... Claire S C Slater
Systematic Biology | VOL. 55
James A Cotton, et. al.James A Cotton ... Claire S C Slater
01 Apr 2006
Systematic Biology | VOL. 55

Robinson-Foulds supertrees.
Mukul S Bansal ... David Fernández-Baca
Algorithms for Molecular Biology | VOL. 5
Mukul S Bansal, et. al.Mukul S Bansal ... David Fernández-Baca
24 Feb 2010
Algorithms for Molecular Biology | VOL. 5

Fast and accurate supertrees: towards large scale phylogenies

-

01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ