Abstract

BackgroundPhylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis.ResultsTreelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms.ConclusionOur software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0860-1) contains supplementary material, which is available to authorized users.

Highlights

  • Phylogenetic trees are central to a wide range of biological studies

  • A wide variety of visualization and annotation tools have been developed in the past [2,3,4] none of them are intended for integrative and comparative analysis

  • Representation of large phylogenies and clustering of nodes has proven to be difficult in epidemiology and evolutionary research [6], where large and complex trees are used for exploration and pattern analysis

Read more

Summary

Results

Output The main outputs of Treelink comprise of browser viewable and downloadable svg graphics for the following results:. Other existing tools require manual annotation of meta-data to associate or attach information to selected tree elements Treelink overcomes these requirements by using standard dataset formats as an integration source, relieving the user of tasks like manual annotations on the leafs and permits integrating associated data directly from the sources of data collection, given that csv is a popular export format of sql-based databases, excel and other spreadsheets. Another upside includes the amount of fields that can be linked to the tree, allowing up to 9 different fields to be integrated. The result was linked and plotted on a geographical

Conclusion
Background
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call