Abstract

As metagenomic studies continue to increase in their number, sequence volume and complexity, the scalability of biological analysis frameworks has become a rate-limiting factor to meaningful data interpretation. To address this issue, we have developed JCVI Metagenomics Reports (METAREP) as an open source tool to query, browse, and compare extremely large volumes of metagenomic annotations. Here we present improvements to this software including the implementation of a dynamic weighting of taxonomic and functional annotation, support for distributed searches, advanced clustering routines, and integration of additional annotation input formats. The utility of these improvements to data interpretation are demonstrated through the application of multiple comparative analysis strategies to shotgun metagenomic data produced by the National Institutes of Health Roadmap for Biomedical Research Human Microbiome Project (HMP) (http://nihroadmap.nih.gov). Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short reads as predicted by the HMP Unified Metabolic Analysis Network (HUMAnN) pipeline. Further, the capacity of METAREP to facilitate the identification and simultaneous comparison of taxonomic and functional annotations including biological pathway and individual enzyme abundances from hundreds of community samples is demonstrated by providing scenarios that describe how these data can be mined to answer biological questions related to the human microbiome. These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the nature and extent of variation in taxonomic and functional profiles across body habitats and individuals. Over one thousand HMP WGS datasets and the latest open source code are available at http://www.jcvi.org/hmp-metarep.

Highlights

  • Several large scale metagenomic studies have been completed or are underway to investigate the genetic composition of microbes in their natural environment

  • Supported attributes range from organismal information (NCBI taxonomy), to functional description, Enzyme Classification (EC), Gene Ontology (GO) [20] or KEGG Orthology (KO) [21] as well as KEGG and MetaCyc [22] pathway assignments

  • The dendrogram topology by taxonomy (Figure 3a, Figure S2) was relatively more consistent in grouping samples from identical or similar body habitats compared to that recovered by function (Figure 3b, Figure S3) in that oral sites were closest to one another followed by samples from the anterior nares, skin and vagina and stool

Read more

Summary

Introduction

Several large scale metagenomic studies have been completed or are underway to investigate the genetic composition of microbes in their natural environment. As a multi-faceted community resource, the HMP includes taxonomic marker studies of 16S rRNA gene sequences [11] as well as a whole genome shotgun (WGS) data survey [10,12,13,14,15] This WGS metagenomic data survey has examined the taxonomy and functional potential of microbial communities from 741 samples taken from up to fifteen body habitats of 108 healthy adult men and women generating in total approximately 38 billion short read sequences (3.5 Tbp) of which over 14 billion sequences were processed and analyzed as a part of this study. The software provides several options to quantify and visualize sample variation that can be used to test this hypothesis To examine this question and to highlight the hierarchical clustering functionality of METAREP, the variation of taxonomic and pathway composition within and across body habitats and individual donors over two time points was investigated. These three oral body habitats were selected since they have the greatest representation of WGS data sets in the oral cavity and together constitute more than one fourth of all HMP metabolic reconstruction datasets (Table 1)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call