Abstract

CpG and UpA dinucleotides are under-represented in vertebrate genomes, whereas most invertebrates only show a bias against UpA. RNA viruses are thought to have evolved genomes that resemble the dinucleotide composition of their hosts, possibly to avoid restriction by the zinc-finger antiviral protein (ZAP). By performing a comprehensive analysis of RNA viruses, we show that, whereas UpA dinucleotides are similarly under-represented irrespective of viral genome composition or host, important differences are observed for CpG. The tendency for vertebrate-infecting viruses to have stronger CpG bias than invertebrate-infecting viruses is not universal. Rather, it is mainly driven by single-stranded (ss) RNA(+) viruses. Conversely, ssRNA(-) viruses have a dinucleotide composition that is unrelated to the host clade. Also, these viruses, especially those in the order Bunyavirales, are extremely CpG-depleted. By focusing on specific viral families, we also show that, even for vertebrate ssRNA(+) viruses, ZAP is unlikely to be a driver of CpG depletion. Consistently, CpG dinucleotides tend to be preferentially depleted in A/U-rich contexts in both vertebrate- and invertebrate-infecting viruses. Finally, within the same viral genomes, individual viral open reading frames (ORFs) can display different CpG content. Analysis of SARS-CoV-2 revealed a remarkable depletion of CpG dinucleotides in ORF1ab and S, but not in N and M. Thus, these results do not support the view that an adaptive shift for CpG depletion in the SARS-CoV-2 lineage occurred as an innate immunity evasion strategy. Our data provide a better understanding of viral evolution and inform approaches based on the modulation of CpG to generate attenuated viruses.IMPORTANCEAkin to a molecular signature, dinucleotide composition can be exploited by the zinc-finger antiviral protein (ZAP) to restrict CpG-rich (and UpA-rich) RNA viruses. ZAP evolved in tetrapods, and it is not encoded by invertebrates and fish. Because a systematic analysis is missing, we analyzed the genomes of RNA viruses that infect vertebrates or invertebrates. We show that vertebrate single-stranded (ss) RNA(+) viruses and, to a lesser extent, double-stranded RNA viruses tend to have stronger CpG bias than invertebrate viruses. Conversely, ssRNA(-) viruses have similar dinucleotide composition whether they infect vertebrates or invertebrates. Analysis of ssRNA(+) viruses that infect mammals, reptiles, and fish indicated that ZAP is unlikely to be a major driver of CpG depletion. We also show that, compared to other coronaviruses, the genome of SARS-CoV-2 is not homogeneously CpG-depleted. Our study provides new insights into virus evolution and strategies for recoding RNA virus genomes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call