Abstract
The complete assembly of viral genomes from metagenomic datasets (short genomic sequences gathered from environmental samples) has proven to be challenging, so there are significant blind spots when we view viral genomes through the lens of metagenomics. One approach to overcoming this problem is to leverage the thousands of complete viral genomes that are publicly available. Here we describe our efforts to assemble a comprehensive resource that provides a quantitative snapshot of viral genomic trends - such as gene density, noncoding percentage, and abundances of functional gene categories - across thousands of viral genomes. We have also developed a coarse-grained method for visualizing viral genome organization for hundreds of genomes at once, and have explored the extent of the overlap between bacterial and bacteriophage gene pools. Existing viral classification systems were developed prior to the sequencing era, so we present our analysis in a way that allows us to assess the utility of the different classification systems for capturing genomic trends.
Highlights
There are an estimated 1031 virus-like particles inhabiting our planet, outnumbering all cellular life forms (Suttle, 2005; Wigington et al, 2016)
Because we aimed to study gene order sequences across different viral groups, we focused on genes whose functions are universally required, namely structural genes. textFile-1.txt provides the structural gene order sequences for all viruses, though the script developed can be modified to visualize the placement of any number of genes or userdefined gene groups
It is important to note that while the National Center for Biotechnology Information (NCBI) viral database represents a large collection of complete viral genomes, it still represents a small fraction of the total viral diversity in nature
Summary
There are an estimated 1031 virus-like particles inhabiting our planet, outnumbering all cellular life forms (Suttle, 2005; Wigington et al, 2016) Despite their presence in astonishing numbers and their impact on the population dynamics and evolutionary trajectories of their hosts, our quantitative knowledge of trends in the genomic properties of viruses remains largely limited with many of the key quantities used to characterize these genomes either scattered across the literature or unavailable altogether. This is in contrast to the growing ability exhibited in resources such as the BioNumbers database (Milo et al, 2010) to assemble in one curated collection the key numbers that characterize cellular life forms. Such advances allow us to appreciate the genomic diversity that is a hallmark of viral genomes (Paez-Espino et al, 2016; Edwards and Rohwer, 2005; Rohwer and Thurber, 2009; Simmonds et al, 2017; Simmonds, 2015; Mokili et al, 2012) and make it possible to assemble some of the key numbers of virology
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.