Abstract

Metagenomics became a standard strategy to comprehend the functional potential of microbial communities, including the human microbiome. Currently, the number of metagenomes in public repositories is increasing exponentially. The Sequence Read Archive (SRA) and the MG-RAST are the two main repositories for metagenomic data. These databases allow scientists to reanalyze samples and explore new hypotheses. However, mining samples from them can be a limiting factor, since the metadata available in these repositories is often misannotated, misleading, and decentralized, creating an overly complex environment for sample reanalysis. The main goal of the HumanMetagenomeDB is to simplify the identification and use of public human metagenomes of interest. HumanMetagenomeDB version 1.0 contains metadata of 69 822 metagenomes. We standardized 203 attributes, based on standardized ontologies, describing host characteristics (e.g. sex, age and body mass index), diagnosis information (e.g. cancer, Crohn's disease and Parkinson), location (e.g. country, longitude and latitude), sampling site (e.g. gut, lung and skin) and sequencing attributes (e.g. sequencing platform, average length and sequence quality). Further, HumanMetagenomeDB version 1.0 metagenomes encompass 58 countries, 9 main sample sites (i.e. body parts), 58 diagnoses and multiple ages, ranging from just born to 91 years old. The HumanMetagenomeDB is publicly available at https://webapp.ufz.de/hmgdb/.

Highlights

  • Metagenomics is the study of the genetic potential of a multispecies assemblage

  • We identified attributes related to the sample country of origin, coordinates, age, sex, body mass index, ethnicity and several other host characteristics listed in Supplementary Table S3

  • Most of the metadata originated from the Sequence Read Archive (SRA) repository, 95.4% (66 589) since it is increasing faster than MG-RAST

Read more

Summary

Introduction

Metagenomics is the study of the genetic potential of a multispecies assemblage. Its most frequent application is to define all genomes of a microbial community in a cultureindependent approach (1). In the past 20 years, highthroughput sequencing prices diminished to a point that metagenomics data increases exponentially (2). Metagenomes encompass an overwhelming volume of complex data, presenting challenges for storage and for metadata annotation and curation (3). Some repositories provide permanent storage for DNA sequencing. The major database is the Sequence Read Archive (SRA) (4), which is part of the International Nucleotide Sequence Database Collaboration (5), along with the European Nucleotide Archive (ENA) (6) and the DNA Data Bank of Japan (7). Other notable databases are the MG-RAST (8), the EBI Metagenomics, or MGnify (9), gcMeta (10), MSE (11) and Qiita (10)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call