Despite many efforts to understand and leverage the functional potential of environmental viromes, most bacteriophage genes are largely uncharacterized. To explore novel biology from uncultivated microbes like phages, metagenomics has emerged as a powerful tool to directly mine new genes without the need to culture the diverse microbiota and the viruses within. When a pure computational approach cannot infer gene function, it may be necessary to create a DNA library from environmental genomic DNA, followed by the screening of that library for a particular function. However, these screens are often initiated without a metagenomic analysis of the completed DNA library being reported. Here, we describe the construction and characterization of DNA libraries from a single cultured phage (ΦT4), five cultured Escherichia coli phages, and three metagenomic viral sets built from freshwater, seawater, and wastewater samples. Through next-generation sequencing of five independent samplings of the libraries, we found a consistent number of recovered genes per replicate for each library, with many genes classifiable via the KEGG and Pharokka databases. By characterizing the size of the genes and inserts, we found that our libraries contain a median of one to two genes per contig with a median gene length of 303-381 bp for all libraries, reflective of the small genomes of viruses. The environmental libraries were genetically diverse compared to the single phage and multi-phage libraries. Additionally, we found reduced coverage of individual genomes when five phages were used as opposed to one. Taken together, this work provides a comprehensive analysis of the DNA libraries from phage genomes that can be used for metagenomic exploration and functional screens to infer and identify new biology.IMPORTANCEFunctional metagenomics is an approach that aims to characterize the putative biological function of genes in the microbial world. This includes an examination of the sequencing data collected from a pooled source of diverse microbes and inference of gene function by comparison to annotated and studied genes from public databases. At times, DNA libraries are made from these genes, and the library is screened for a specific function. Hits are validated using a combination of biological, computational, and structural analysis. Left unresolved is a detailed characterization of the library, both its diversity and content, for the purposes of imputing function entirely by computational means, a process that may yield findings that aid in designing useful screens to identify novel gene functions. In this study, we constructed libraries from cultured phages and uncultured viromes from the environment and characterized some important parameters, such as gene number, genes per contig, ratio of hypothetical to known proteins, total genomic coverage and recovery, and the effect of pooling genetic information from multiple sources, to provide a better understanding of the nature of these libraries. This work will aid the design and implementation of future screens of pooled DNA libraries to discover and isolate viral genes with novel biology across various biomes.
Read full abstract