Abstract

Exponential rise of metagenomics sequencing is delivering massive functional environmental genomics data. However, this also generates a procedural bottleneck for on-going re-analysis as reference databases grow and methods improve, and analyses need be updated for consistency, which require access to increasingly demanding bioinformatic and computational resources. Here, we present the KAUST Metagenomic Analysis Platform (KMAP), a new integrated open web-based tool for the comprehensive exploration of shotgun metagenomic data. We illustrate the capacities KMAP provides through the re-assembly of ~ 27,000 public metagenomic samples captured in ~ 450 studies sampled across ~ 77 diverse habitats. A small subset of these metagenomic assemblies is used in this pilot study grouped into 36 new habitat-specific gene catalogs, all based on full-length (complete) genes. Extensive taxonomic and gene annotations are stored in Gene Information Tables (GITs), a simple tractable data integration format useful for analysis through command line or for database management. KMAP pilot study provides the exploration and comparison of microbial GITs across different habitats with over 275 million genes. KMAP access to data and analyses is available at https://www.cbrc.kaust.edu.sa/aamg/kmap.start.

Highlights

  • Exponential rise of metagenomics sequencing is delivering massive functional environmental genomics data

  • We report as a requirement for improved metagenomics (a) the re-assembly of public metagenomic samples; (b) creation of gene catalogs from diverse environments using subset of assemblies, as a pilot study; (c) re-annotation of existing gene catalogs for improved coverage; (d) the design of Gene Information Tables (GITs) to standardize shotgun metagenomic analysis, reporting and allowing reuse; followed by (e) KAMP annotation and exploration methods compared to other existing platforms allowing metagenomics analysis; and (f) KAMP capacities for targeted comparison and interrogation of key genes of interest such as antibiotic resistance genes (ARGs) from different environments accessible through microbial gene catalogues in King Abdullah University of Science and Technology (KAUST) Metagenomic Analysis Platform (KMAP) database

  • In order to expand the access to GITs and analytics of metagenomics data to larger scientific community, without the need of advanced computational skills or resources, we provide indexed GITs through KMAP’s online ‘Compare Module’ by extending and repurposing the standard framework of Metagenomic Reports ­(MetaRep21) software

Read more

Summary

Introduction

Exponential rise of metagenomics sequencing is delivering massive functional environmental genomics data. Such a gene pool, coupled with sample metadata (e.g., temperature and salinity), can serve as a basis to accelerate discovery and applications for industries such as biotechnology, pharmaceutics, food and energy, and others This re-analysis is, challenging as it requires advanced bioinformatics skills and computational resources in order to process all existing and new metagenomic samples through state-of-the-art methods to assemble and predict genes and metabolic processes, clustering and functional annotation with updated reference databases. Another major challenge is the lack of standards in metagenomic data analysis, reporting and data sharing for reproducibility, repurposing and r­ euse[16]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call