Abstract
SummaryIncreasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome.
Highlights
Microbiome analysis involves determining the composition and function of the community of microorganisms in a particular locale (Claesson et al, 2017)
We take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree
On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome
Summary
Microbiome analysis involves determining the composition and function of the community of microorganisms in a particular locale (Claesson et al, 2017). Microbial sequences can be organized in terms of a hierarchical annotation tree, e.g., a taxonomy of genomes or functional units. In light of these observations, an optimal strategy would assign reads directly to the relevant lowest taxonomic unit (LTU) in a taxonomic tree. This paper focuses on efficient functional classification of microbiome sequencing reads in terms of a functional taxonomy (such as KEGG enzyme codes Kanehisa et al, 2017). The challenge is to efficiently and accurately assign reads to the relevant LTU, given a large database of sequences that have been annotated with a functional taxonomy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.