Abstract

Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.

Highlights

  • Microorganisms can be found in almost every environment of the Earth’s biosphere and are responsible for numerous biological activities including carbon and nitrogen cycling [1], organic contaminant remediation [2,3,4] and human health and disease

  • We developed a novel k-mer-based approach, termed GSMer, to identify genome-specific markers (GSMs) from currently sequenced microbial genomes, which could be used for accurate strain/species-level identification of microorganisms in metagenomes

  • Because a clear definition of microbial strains and species is still widely debated, strains and species here were defined based on the National Center for Biotechnology Information (NCBI) classification system, where the binomial nomenclature part defines a species and the ID followed by the binomial name defines a strain

Read more

Summary

Introduction

Microorganisms can be found in almost every environment of the Earth’s biosphere and are responsible for numerous biological activities including carbon and nitrogen cycling [1], organic contaminant remediation [2,3,4] and human health and disease. Many human disorders, such as type 2 diabetes (T2D), obesity, dental cavities, cancer and some immune-related diseases, are known to be related with a single or a group of microorganisms [5,6,7,8,9,10,11]. It is necessary to use other molecular markers to identify and characterize microorganisms at the strain/species level in complex environments

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call