Abstract
BackgroundFor decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost-effective way to screen samples of interest for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data.ResultsWe present a computational method called pangenome-based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU’s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome’s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8–0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed-reference OTU picking strategies against specific reference sequence databases.ConclusionsWe developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub (https://github.com/srjun/PanFP).Electronic supplementary materialThe online version of this article (doi:10.1186/s13104-015-1462-8) contains supplementary material, which is available to authorized users.
Highlights
For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions
Here, we present a new computational method called pangenome-based functional profile (PanFP), which infers the functional profiles of microbial communities based on 16S rRNA gene survey data
We validated pangenome-based functional profiles (PanFP) in comparison with sequenced metagenomes and an existing method, PICRUSt [4] using 65 different environmental and mock community samples derived as part of the Human Metagenome Project (HMP) and other projects
Summary
For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. RNA occurs universally and is highly conserved among all species of Bacteria and Archaea [3] Using existing technologies, these statistically robust data can be obtained for community studies over large numbers of samples, and thousands of 16S rRNA genes are obtained for each sample. Metagenomics provides a direct view of the communities’ gene content and their functional capability Such studies aim to understand how microbes interact, and perform complex functions in a variety of environments through answering questions such as, what species are present and abundant, as well as what functions are present or absent based on identification of a panel of microbial organisms, genes, variants, pathways, or metabolic functions [5, 6]. Several recent studies have indicated that the functional content of complex communities remains more constant than the phylogenetic composition of the community [8, 9]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.