Abstract

ABSTRACTMetagenomic sequencing provides information on the metabolic capacities and taxonomic affiliations for members of a microbial community. When assessing metabolic functions in a community, missing genes in pathways can occur in two ways; the genes may legitimately be missing from the community whose DNA was sequenced, or the genes were missed during shotgun sequencing or failed to assemble, and thus the metabolic capacity of interest is wrongly absent from the sequence data. Here, we borrow and adapt occupancy modeling from macroecology to provide mathematical context to metabolic predictions from metagenomes. We review the five assumptions underlying occupancy modeling through the lens of microbial community sequence data. Using the methane cycle, we apply occupancy modeling to examine the presence and absence of methanogenesis and methanotrophy genes from nearly 10,000 metagenomes spanning global environments. We determine that methanogenesis and methanotrophy are positively correlated across environments, providing a predictive framework for assessing gene absences for these functions. We present this adaptation of macroecology’s occupancy modeling to metagenomics as a tool to quantify the uncertainty in predictions of the presence/absence of traits in environmental microbiological surveys. We further initiate a call for stronger metadata standards to accompany metagenome deposition, to enable robust statistical approaches in the future.IMPORTANCE Metagenomics is maturing rapidly as a field but is hampered by a lack of available statistical tools. A primary area of uncertainty is around missing genes or functions from a metagenomic data set. Here, we borrow an established modeling approach from macroecology and adapt it to metagenomic data sets. Rather than multiple sampling trips to a specific area to detect a species of interest (e.g., identifying a cardinal in a forest), we leverage the enormous amount of information within a metagenome and use multiple gene markers for a function of interest (e.g., subunits of an enzyme complex). We applied our adapted occupancy modeling to a case study examining methane cycling capacity. Our models show methanogens and methanotrophs are both more likely to cooccur than be present in the absence of the other guild. The lack of consistent and complete metadata is a significant hurdle for increasing the statistical rigor of metagenomic analyses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call