GToTree: a user-friendly workflow for phylogenomics.

Michael D Lee,Yann Ponty

doi:10.1093/bioinformatics/btz188

Michael D Lee, Yann Ponty

Open Access

https://doi.org/10.1093/bioinformatics/btz188

Copy DOI

Abstract

SummaryGenome-level evolutionary inference (i.e. phylogenomics) is becoming an increasingly essential step in many biologists’ work. Accordingly, there are several tools available for the major steps in a phylogenomics workflow. But for the biologist whose main focus is not bioinformatics, much of the computational work required—such as accessing genomic data on large scales, integrating genomes from different file formats, performing required filtering, stitching different tools together etc.—can be prohibitive. Here I introduce GToTree, a command-line tool that can take any combination of fasta files, GenBank files and/or NCBI assembly accessions as input and outputs an alignment file, estimates of genome completeness and redundancy, and a phylogenomic tree based on a specified single-copy gene (SCG) set. Although GToTree can work with any custom hidden Markov Models (HMMs), also included are 13 newly generated SCG-set HMMs for different lineages and levels of resolution, built based on searches of ∼12 000 bacterial and archaeal high-quality genomes. GToTree aims to give more researchers the capability to make phylogenomic trees.Availability and implementationGToTree is open-source and freely available for download from: github.com/AstrobioMike/GToTree. It is implemented primarily in bash with helper scripts written in python.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

The number of sequenced genomes is increasing rapidly, largely through the recovery of metagenome-assembled genomes (MAGs) (e.g. Hug et al 2016; Parks et al 2017) and through the generation of single-cell amplified genomes (SAGs) (e.g. Kashtan et al 2014; Berube et al 2018)
Large-scale comparative genomics efforts leveraging growing public databases can be employed to investigate evolutionary avenues such as ancestral reconstruction (Braakman, Follows, and Chisholm 2017), which are guided by phylogenomics
There are several tools available for the major steps in a phylogenomics workflow, and at least one analysis platform that incorporates a phylogenomics workflow amid a larger infrastructure

Summary

Introduction

The number of sequenced genomes is increasing rapidly, largely through the recovery of metagenome-assembled genomes (MAGs) (e.g. Hug et al 2016; Parks et al 2017) and through the generation of single-cell amplified genomes (SAGs) (e.g. Kashtan et al 2014; Berube et al 2018). GToTree fills a void on three primary fronts: 1) it accepts as input any combination of fasta files, GenBank files, and/or NCBI accessions – allowing integration of genomes from various sources and stages of analysis without any computational burden to the user; 2) it enables the automation of required between-tool tasks such as filtering out hits by gene-length, filtering out genomes with too few hits to the specified target genes, and swapping genome identifiers so resulting trees and alignments can be explored more ; and 3) its scalability – GToTree can turn ~1,700 input genomes into a tree in one hour on a standard laptop, and can optionally run many steps in parallel. The required inputs to GToTree are 1) any combination of fasta files, GenBank files, and/or NCBI assembly accessions, and 2) an HMM file with the target genes. The user can provide a mapping file of specific input genome IDs with the labels they would like to have displayed in the final alignment and tree.

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer applications in the biosciences : CABIOS	Publication Date: Mar 13, 2019
Citations: 273	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

GToTree: a user-friendly workflow for phylogenomics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer applications in the biosciences : CABIOS

Lead the way for us

Similar Papers

Editorial: Z-curve Applications in Genome Analysis.
Chun-Ting Zhang
Current Genomics | VOL. 15
Chun-Ting ZhangChun-Ting Zhang
01 Apr 2014
Current Genomics | VOL. 15

Casboundary: automated definition of integral Cas cassettes.
Victor A Padilha ... Robinson Peter
Computer applications in the biosciences : CABIOS | VOL. 37
Victor A Padilha, et. al.Victor A Padilha ... Robinson Peter
06 Dec 2020
Computer applications in the biosciences : CABIOS | VOL. 37

Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices.
Jenna Morgan Lang ... Paul J Planet
PloS one | VOL. 8
Jenna Morgan Lang, et. al.Jenna Morgan Lang ... Paul J Planet
25 Apr 2013
PloS one | VOL. 8

DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes
Feng Gao ... Chun-Ting Zhang
Nucleic acids research | VOL. 41
Feng Gao, et. al.Feng Gao ... Chun-Ting Zhang
23 Oct 2012
Nucleic acids research | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GToTree: a user-friendly workflow for phylogenomics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer applications in the biosciences : CABIOS