Abstract

Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into ‘bins’ representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies ‘training’ sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have.Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4–6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods.Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.

Highlights

  • Metagenomics is the functional or sequence-based analysis of microbial DNA isolated directly from a microbial community of interest (Riesenfeld, Schloss & Handelsman, 2004; Kunin et al, 2008)

  • We evaluated PPS+ by comparing it to homology-based methods (MEGAN4, taxator-tk) (Huson et al, 2011; Dröge, Gregor & McHardy, 2014), the fast taxonomic binning program Kraken (Wood & Salzberg, 2014), the composition-based method PhyloPythia trained under expert guidance and to a generic PPS model using default settings (Supplemental Information 1, Section 3.5–3.8)

  • For a performance comparison of PPS to methods with prohibitive runtimes for large datasets, such as PhymmBL (Brady & Salzberg, 2011) and CARMA3 (Gerlach & Stoye, 2011), and the web-based tool Naïve Bayes classifier (NBC) (Rosen, Reichenberger & Rosenfeld, 2011) see Patil et al (2011); Patil, Roune & McHardy (2011); Dröge, Gregor & McHardy (2014), as PPS has already been compared to these methods with favorable outcomes

Read more

Summary

Introduction

Metagenomics is the functional or sequence-based analysis of microbial DNA isolated directly from a microbial community of interest (Riesenfeld, Schloss & Handelsman, 2004; Kunin et al, 2008). The taxonomic classification or ‘binning’ of metagenome samples is often performed after sequence assembly (Peng et al, 2011; Laserson, Jojic & Koller, 2011; Boisvert et al, 2012; Namiki et al, 2012; Pell et al, 2012) This is a computationally demanding task, which for metagenome samples results in a mixture of sequence fragments of varying lengths, originating from the different microbial community members. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies ‘training’ sequences based on marker genes directly from the sample. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call