Abstract

SummaryIncreasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome.

Highlights

  • Microbiome analysis involves determining the composition and function of the community of microorganisms in a particular locale (Claesson et al, 2017)

  • We take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree

  • On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome

Read more

Summary

Introduction

Microbiome analysis involves determining the composition and function of the community of microorganisms in a particular locale (Claesson et al, 2017). Microbial sequences can be organized in terms of a hierarchical annotation tree, e.g., a taxonomy of genomes or functional units. In light of these observations, an optimal strategy would assign reads directly to the relevant lowest taxonomic unit (LTU) in a taxonomic tree. This paper focuses on efficient functional classification of microbiome sequencing reads in terms of a functional taxonomy (such as KEGG enzyme codes Kanehisa et al, 2017). The challenge is to efficiently and accurately assign reads to the relevant LTU, given a large database of sequences that have been annotated with a functional taxonomy

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call