Abstract
Identification of the full complement of genes and other functional elements in any virus is crucial to fully understand its molecular biology and guide the development of effective control strategies. RNA viruses have compact multifunctional genomes that frequently contain overlapping genes and non-coding functional elements embedded within protein-coding sequences. Overlapping features often escape detection because it can be difficult to disentangle the multiple roles of the constituent nucleotides via mutational analyses, while high-throughput experimental techniques are often unable to distinguish functional elements from incidental features. However, RNA viruses evolve very rapidly so that, even within a single species, substitutions rapidly accumulate at neutral or near-neutral sites providing great potential for comparative genomics to distinguish the signature of purifying selection. Computationally identified features can then be efficiently targeted for experimental analysis. Here we analyze alignments of protein-coding virus sequences to identify regions where there is a statistically significant reduction in the degree of variability at synonymous sites, a characteristic signature of overlapping functional elements. Having previously tested this technique by experimental verification of discoveries in selected viruses, we now analyze sequence alignments for ∼700 RNA virus species to identify hundreds of such regions, many of which have not been previously described.
Highlights
With the notable exception of smallpox virus, the majority of viruses with the potential to cause acute fatal disease in healthy adult humans are RNA viruses
For Venezuelan equine encephalitis alphavirus (VEEV), the analysis revealed all of the known functional elements––the 51-nt CSE [43], the packaging signal [44,45], an extended stemloop structure that mediates stop-codon readthrough [24], the 5’ end of the subgenomic RNA (sgRNA) promoter [46] and the overlapping TF ORF and associated -1 frameshift stimulating elements within the 6K region [2] (Figure 2)
While related methods have been developed by others and used for the analysis of selected virus genomes, including Hepatitis C virus [87,88,89,90], GB virus C [20], some potyvirids [21], pestiviruses and enteroviruses [89], influenza A virus (IAV) [22], caliciviruses [64], Human immunodeficiency virus 1 [91,92], Rotavirus A [93] and hepatitis E virus (HEV) [94], most of this previous work either does not incorporate phylogeny and/or does not involve calculation of P-values
Summary
With the notable exception of smallpox virus, the majority of viruses with the potential to cause acute fatal disease in healthy adult humans are RNA viruses. Such viruses include influenza A virus (IAV), Ebola virus, rabies virus, SARS virus, MERS virus, Japanese encephalitis virus, yellow fever virus, dengue virus, eastern equine encephalitis virus and Lassa virus. Many other human pathogenic viruses are RNA viruses, including poliovirus, hepatitis A virus, hepatitis C virus, hepatitis E virus (HEV), rubella virus, chikungunya virus, Norwalk virus, mumps virus and measles virus. The combined impact of RNA viruses––economically and in terms of human suffering––is immense
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have