Abstract

Bacteriophages have important roles in the ecology of the human gut microbiome but are under-represented in reference databases. To address this problem, we assembled the Metagenomic Gut Virus catalogue that comprises 189,680 viral genomes from 11,810 publicly available human stool metagenomes. Over 75% of genomes represent double-stranded DNA phages that infect members of the Bacteroidia and Clostridia classes. Based on sequence clustering we identified 54,118 candidate viral species, 92% of which were not found in existing databases. The Metagenomic Gut Virus catalogue improves detection of viruses in stool metagenomes and accounts for nearly 40% of CRISPR spacers found in human gut Bacteria and Archaea. We also produced a catalogue of 459,375 viral protein clusters to explore the functional potential of the gut virome. This revealed tens of thousands of diversity-generating retroelements, which use error-prone reverse transcription to mutate target genes and may be involved in the molecular arms race between phages and their bacterial hosts.

Highlights

  • There have been two main approaches for sequencing viral genomes from the microbiome: viral metagenomic sequencing and bulk metagenomic sequencing

  • We developed a viral detection pipeline for the current study using a combination of well-established methods and signatures, including VirFinder32, viral protein families from the Earth’s Virome Study23, and the propensity for viral genes to lie on the same strand33 and be functionally unannotated8 (Fig. 1a,b)

  • We applied our pipeline to bulk metagenomes from 11,810 distinct human gut samples that were assembled in previous studies29,31,36 to broadly capture lytic and lysogenic DNA viruses (Fig. 1a and Supplementary Table 3)

Read more

Summary

Introduction

There have been two main approaches for sequencing viral genomes from the microbiome: viral metagenomic sequencing and bulk metagenomic sequencing. The number of publicly available bulk metagenomes has rapidly grown, as evidenced by recent, large-scale data mining efforts29–31 To expand these existing resources and provide a complementary view of the gut virome, we performed large-scale identification of viral genomes from 11,810 bulk metagenomes from human stool samples derived from 61 previously published studies. We used these data to form the Metagenomic Gut Virus (MGV) catalogue, which contains 189,680 viral draft genomes estimated to be >50% complete and representing 54,118 candidate viral species.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.