Abstract

The genomes of cellular organisms display CpG and TpA dinucleotide composition biases. Such biases have been poorly investigated in dsDNA viruses. Here, we show that in dsDNA virus, bacterial, and eukaryotic genomes, the representation of TpA and CpG dinucleotides is strongly dependent on genomic G + C content. Thus, the classical observed/expected ratios do not fully capture dinucleotide biases across genomes. Because a larger portion of the variance in TpA frequency was explained by G + C content, we explored which additional factors drive the distribution of CpG dinucleotides. Using the residuals of the linear regressions as a measure of dinucleotide abundance and ancestral state reconstruction across eukaryotic and prokaryotic virus trees, we identified an important role for phylogeny in driving CpG representation. Nonetheless, phylogenetic ANOVA analyses showed that few host associations also account for significant variations. Among eukaryotic viruses, most significant differences were observed between arthropod-infecting viruses and viruses that infect vertebrates or unicellular organisms. However, an effect of viral DNA methylation status (either driven by the host or by viral-encoded methyltransferases) is also likely. Among prokaryotic viruses, cyanobacteria-infecting phages resulted to be significantly CpG-depleted, whereas phages that infect bacteria in the genera Burkolderia and Staphylococcus were CpG-rich. Comparison with bacterial genomes indicated that this effect is largely driven by the general tendency for phages to resemble the host's genomic CpG content. Notably, such tendency is stronger for temperate than for lytic phages. Our data shed light into the processes that shape virus genome composition and inform manipulation strategies for biotechnological applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call