Abstract

The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus-host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7-38% of 'unknown' sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus-host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.

Highlights

  • Over the past two decades, our collective understanding of microbial diversity has been profoundly expanded by cultivation-independent molecular methods (Pace, 1997; Whitman et al, 1998; Rappeand Giovannoni, 2003; DeLong, 2009; Hanson et al, 2012)

  • VirSorter identifies viral sequences through (i) statistical enrichment in viral gene content, using a reference database composed of viral genomes of archaeal and bacterial viruses from RefSeq and assembled from viral metagenomes, or (ii) a combination of viral ‘hallmark’ gene(s) that code for virion-related functions such as major capsid proteins or terminases (Koonin et al, 2006; Roux et al, 2014), and at least one viral-like genomic feature: statistical depletion in genes with a hit in the PFAM database, statistical enrichment in uncharacterized genes, short genes, or strand bias

  • While recent advances in high-throughput sequencing and viral metagenomics continue to expand the bounds of viral sequence space (e.g., Reyes et al, 2012; Mizuno et al, 2013; Brum and Sullivan, 2015), such viruses are typically unlinked to cognate hosts, severely limiting ecological and evolutionary inferences

Read more

Summary

Introduction

Over the past two decades, our collective understanding of microbial diversity has been profoundly expanded by cultivation-independent molecular methods (Pace, 1997; Whitman et al, 1998; Rappeand Giovannoni, 2003; DeLong, 2009; Hanson et al, 2012). It is widely recognized that interconnected microbial communities drive matter and energy transformations in natural and engineered ecosystems (Falkowski et al, 2008), while contributing to health and disease states in multicellular hosts (Clemente et al, 2012). Concomitant with this changing worldview is a growing awareness that viruses modulate microbial interaction networks and long-term evolution with resulting feedbacks on ecosystem functions and services (Suttle, 2007; Rodriguez-Valera et al, 2009; Forterre and Prangishvili, 2013; Hurwitz et al, 2013; Brum et al, 2014; Brum and Sullivan, 2015). Our understanding of viral diversity and virus–host interactions remains a major bottleneck in the development of predictive ecosystem models and unifying eco-evolutionary theories.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call