Abstract

Phylogenetics is nowadays at the center of numerous studies in many fields, ranging from comparative genomics to molecular epidemiology. However, phylogenetic analysis workflows are usually complex and difficult to implement, as they are often composed of many small, reccuring, but important data manipulations steps. Among these, we can find file reformatting, sequence renaming, tree re-rooting, tree comparison, bootstrap support computation, etc. These are often performed by custom scripts or by several heterogeneous tools, which may be error prone, uneasy to maintain and produce results that are challenging to reproduce. For all these reasons, the development and reuse of phylogenetic workflows is often a complex task. We identified many operations that are part of most phylogenetic analyses, and implemented them in a toolkit called Gotree/Goalign. The Gotree/Goalign toolkit implements more than 120 user-friendly commands and an API dedicated to multiple sequence alignment and phylogenetic tree manipulations. It is developed in Go, which makes executables easily installable, integrable in workflow environments, and parallelizable when possible. Moreover, Go is a compiled language, which accelerates computations compared to interpreted languages. This toolkit is freely available on most platforms (Linux, MacOS and Windows) and most architectures (amd64, i386) on GitHub at https://github.com/evolbioinfo/gotree, Bioconda and DockerHub.

Highlights

  • Increase in computer power and development of bioinformatics methods that handle very large datasets make it possible to perform phylogenetic analyses at an unprecedented scale

  • We developed Gotree/Goalign, a toolkit dedicated to burdensome and repetitive phylogenetic tasks, which (i) consists of two user-friendly executables, gotree and goalign, integrating state of the art phylogenetic commands and requiring no programming skills to use, (ii) provides a set of chainable commands that are integrable in workflows (e.g. Nextflow [18] or Snakemake [19]), (iii) is straightforward to install via static binaries available for most platforms and architectures, and (iv) provides a public API accessible to developers wanting to manipulate phylogenetic trees and multiple sequence alignments in Go

  • We developed the Gotree/Goalign toolkit to simplify the manipulation of phylogenetic trees and alignments, and to facilitate the development of reproducible phylogenetic workflows

Read more

Summary

Introduction

Increase in computer power and development of bioinformatics methods that handle very large datasets make it possible to perform phylogenetic analyses at an unprecedented scale. The main objective of this workflow is to analyse a phylogenomic dataset to infer a species tree and compare it to a reference tree It contains 19 steps among which the majority is not constituted by usual computer intensive tasks (e.g. tree inference and multiple sequence alignment), but rather by alignment and tree manipulations such as downloading, renaming, reformating, comparing, rerooting, annotating, etc. These tasks are (i) repetitive (found in many similar workflows), (ii) tedious to implement (many ways to do it) and (iii) error prone, which makes this workflow difficult to implement, describe and reproduce

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.