Abstract

Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

Highlights

  • Viruses are the most abundant and diverse nucleic acid-based entities on Earth (Weinbauer, 2004; Edwards and Rohwer, 2005; Thurber, 2009)

  • Area of a genome coverage plot normalized to the total number of nucleotides in the phage genome. This value is similar to the total abundance of a phage in all metagenomes; it considers each nucleotide covered in the phage genome and not just the number of sequence reads that match that genome

  • Fraction of the phage genome that matches at least one metagenomic sequence

Read more

Summary

Introduction

Viruses are the most abundant and diverse nucleic acid-based entities on Earth (Weinbauer, 2004; Edwards and Rohwer, 2005; Thurber, 2009). Sequence-based strategies, notably the metagenomics technologies developed in the past decade (Breitbart et al, 2002; Breitbart and Rohwer, 2005), have revolutionized phage ecology (e.g., Breitbart et al, 2003; Angly et al, 2006; Thurber et al, 2009; Belcaid et al, 2010; Rodriguez-Brito et al, 2010; Swanson et al, 2011; Mizuno et al, 2013; Martinez Martinez et al, 2014) Despite those major advances, systematic surveys of phage genes and genomes in available metagenomes remain scarce partly because of the lack of well-established mathematical methods or metrics that define various aspects of phage distribution, abundance, and gene coverage

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call