Abstract
BackgroundThe function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner.ResultsWe evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models.ConclusionsIn past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.
Highlights
The function of many genes is still not known even in model organisms
Thereby our methodology provides a probabilistic score for an assignment of each Gene Ontology (GO) term to a Cluster of Orthologous Groups (COG) gene family
We found that metagenome phyletic profiles (MPP) constructed from metagenomes representing distinct environments can predict distinct sets of GO functions: only 43% of the GO terms (232 of total 544) are learnable by both MPP-H and MPP-O (Fig. 1d)
Summary
The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. Many genes still have no known function or have only a very general role assigned. This holds true even for well-studied model organisms, where a quarter or more of the genes are poorly characterized [1,2,3]. Prominent examples include analyses of phylogenetic profiles ( called phyletic profiles (PP)), where functional associations between genes are inferred from similar patterns of occurrence of homologs across
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.