Abstract

A subset of guanine-rich nucleic acid sequences has the potential to fold into G-quadruplex (G4) secondary structures, which are functionally important for several biological processes, including genome stability and regulation of gene expression. Putative quadruplex sequences (PQSs) G3+N1–7G3+N1–7G3+N1–7G3+ are widely found in eukaryotic and prokaryotic genomes, but the base composition of the N1-7 loops is biased across species. Since the viruses partially hijack their hosts’ cellular machinery for proliferation, we examined the PQS motif size, loop length, and nucleotide compositions of 7370 viral genome assemblies and compared viral and host PQS motifs. We studied seven viral taxa infecting five distant eukaryotic hosts and created a resource providing a comprehensive view of the viral quadruplex motifs. Overall, short-looped PQSs are predominant and with a similar composition across viral taxonomic groups, albeit subtle trends emerge upon classification by hosts. Specifically, there is a higher frequency of pyrimidine loops in viruses infecting animals irrespective of the viruses’ genome type. This observation is confirmed by an in-depth analysis of the Herpesviridae family of viruses, which showed a distinctive accumulation of thermally stable C-looped quadruplexes in viruses infecting high-order vertebrates. The occurrence of viral C-looped G4s, which carry binding sites for host transcription factors, as well as the high prevalence of viral TTA-looped G4s, which are identical to vertebrate telomeric motifs, provide concrete examples of how PQSs may help viruses impinge upon, and benefit from, host functions. More generally, these observations suggest a co-evolution of virus and host PQSs, thus underscoring the potential functional significance of G4s.

Highlights

  • G-quadruplexes (G4s) are alternative DNA or RNA secondary structures formed by the stacking of planar arrangements of guanine residues, further stabilized by monovalent cations [1]

  • We analyzed several viral genome metrics: genome size, which varies from 0.2 kbp to over 2400 kbp; GC content (%), which varies from 17.8% to 76.1%; and Putative quadruplex-forming sequences (PQSs) densities (PQS/kbp), that allow to compare the quadruplex content of each assembly independently of the genome lengths, as well as their presence on the positive (G-rich) or negative (C-rich) strand (Materials and Methods)

  • The Herpesviridae family, further analyzed hereafter, exhibits the highest PQS content: we found a total of 6735 motifs, with an average density of 0.45 ± 0.60 PQS/kbp and up to 2.8 PQS/kbp in the Papiine alpha herpesvirus 2 (Table S2)

Read more

Summary

Introduction

G-quadruplexes (G4s) are alternative DNA or RNA secondary structures formed by the stacking of planar arrangements of guanine residues, further stabilized by monovalent cations [1]. The importance of quadruplex-forming sequences as regulatory elements has been supported by extensive evidence in eukaryotic cells [2,3,4]. The consensus sequence motif G3+ N1–7 G3+ N1–7 G3+ N1–7 G3+ has been used to identify potential PQSs [5,10]. This has led to an estimate of over 400,000 PQSs in the human reference genome, with a median density of 0.5 motif per kbp. In other eukaryotes and in bacteria, the density of G4 motifs is highly variable (2.5 to >0.1 motifs per kbp) [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call