Abstract

Microbial genomes are available at an ever-increasing pace, as cultivation and sequencing become cheaper and obtaining metagenome-assembled genomes (MAGs) becomes more effective. Phylogenetic placement methods to contextualize hundreds of thousands of genomes must thus be efficiently scalable and sensitive from closely related strains to divergent phyla. We present PhyloPhlAn 3.0, an accurate, rapid, and easy-to-use method for large-scale microbial genome characterization and phylogenetic analysis at multiple levels of resolution. PhyloPhlAn 3.0 can assign genomes from isolate sequencing or MAGs to species-level genome bins built from >230,000 publically available sequences. For individual clades of interest, it reconstructs strain-level phylogenies from among the closest species using clade-specific maximally informative markers. At the other extreme of resolution, it scales to large phylogenies comprising >17,000 microbial species. Examples including Staphylococcus aureus isolates, gut metagenomes, and meta-analyses demonstrate the ability of PhyloPhlAn 3.0 to support genomic and metagenomic analyses.

Highlights

  • Genomes from isolate sequencing, metagenomic assembly, and single-cell sequencing are being generated at an increasing pace, and they are all correspondingly increasingly available through public resources

  • Multi-resolution phylogenetic reconstruction is at the core of the approach to assign taxonomic labels from phylum to species level to input genomes or metagenome-assembled genomes (MAGs), which exploits >150,000 MAGs and >80,000 reference genomes integrated into the PhyloPhlAn 3.0 database

  • PhyloPhlAn 3.0 is not bound to particular methodological choices for the internal steps: it allows users to choose among multiple tools for sequence mapping[27,28,29], MSA10,11,13,30, post-processing of the alignments[31], with phylogenetic models ranging from maximum-likelihood methods applied on concatenated alignments[6,16,22] to gene tree approaches integrating the information of multiple distinct markers[18,21]

Read more

Summary

Introduction

Genomes from isolate sequencing, metagenomic assembly, and single-cell sequencing are being generated at an increasing pace, and they are all correspondingly increasingly available through public resources. These, include the first implementation of PhyloPhlAn1, PhyloSift[2], ezTree[3], GToTree[4], and AMPHORA5, among many others for more general genome- and gene-based phylogenetics[6,7] Most of these methods are limited in at least one way that prevents their ease of use to link newly sequenced genomes, or metagenomic assemblies, into the tremendous space of already characterized microbial phylogenies. While computational methods for genome assembly of isolate sequencing and for quantitative analysis of known features of metagenomic data are mature and well standardized, comparably convenient and automatic tools for downstream phylogenetic and taxonomic assessment of MAGs and microbial isolate genomes are instead lacking and limiting microbial genomic analyses These end-to-end phylogenetic solutions should be differentiated from algorithms and implementations for individual steps of genome placement (e.g., pplacer[8] and SEPP9) and taxonomic assessment. We present here PhyloPhlAn 3.0, a fully automatic, endto-end phylogenetic analysis framework for contextualization and characterization of newly assembled microbial isolates and Isolate genomes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call