Abstract

Our current knowledge about nucleocytoplasmic large DNA viruses (NCLDVs) is largely derived from viral isolates that are co-cultivated with protists and algae. Here we reconstructed 2,074 NCLDV genomes from sampling sites across the globe by building on the rapidly increasing amount of publicly available metagenome data. This led to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. Analysis of 58,023 major capsid proteins from large and giant viruses using metagenomic data revealed the global distribution patterns and cosmopolitan nature of these viruses. The discovered viral genomes encoded a wide range of proteins with putative roles in photosynthesis and diverse substrate transport processes, indicating that host reprogramming is probably a common strategy in the NCLDVs. Furthermore, inferences of horizontal gene transfer connected viral lineages to diverse eukaryotic hosts. We anticipate that the global diversity of NCLDVs that we describe here will establish giant viruses—which are associated with most major eukaryotic lineages—as important players in ecosystems across Earth’s biomes.

Highlights

  • We have used a multistep metagenome data-mining, binning and iterative-filtering pipeline (Extended Data Figs. 1, 2 and Supplementary Text 1), which led to the recovery of genomes representing 2,074 putative nucleocytoplasmic large DNA viruses (NCLDVs) populations from 8,535 publicly available metagenomes in the Integrated Microbial Genomes and Microbiomes (IMG/M) database[15]

  • Using an approach that relied on conserved NCVOGs, we estimated genome completeness and contamination, which led to the classification of 773 high-quality, 989 medium-quality and 312 low-quality GVMAGs

  • The greatest number of GVMAGs could be attributed to MGVL57 (n = 205), the Yellowstone Lake mimiviruses (YLMVs; n = 119) and MGVL42 (n = 84)

Read more

Summary

Check for updates

Frederik Schulz1 ✉, Simon Roux[1], David Paez-Espino[1], Sean Jungbluth[1], David A. We reconstructed 2,074 NCLDV genomes from sampling sites across the globe by building on the rapidly increasing amount of publicly available metagenome data. This led to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. The assembly size, GC content, coding density and copy number of nucleocytoplasmic virus orthologous genes (NCVOGs)[16] were comparable to previously described NCLDV genomes, supporting the classification of these genomes as giant virus metagenome-assembled genomes (GVMAGs) Augmenting the existing NCLDV phylogenetic framework with the GVMAGs substantially increased the diversity of this proposed viral order (Fig. 1a and Supplementary Data 1).

MCP count Novel Prasinoviruses
Chlorophyll ab binding protein Chlorophyllase enzyme
Unassigned SAR
Online content
Generation of models to detect NCLDV proteins
Targeted binning of putative NCLDV metagenome contigs
Filtering of GVMAGs
GVMAG quality on the basis of estimated completeness and contamination
Survey of the NCLDV MCP
NCLDV species tree
Protein trees
Findings
Code availability
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call