Abstract

A well-known mechanism through which new protein-coding genes originate is by modification of pre-existing genes, e.g. by duplication or horizontal transfer. In contrast, many viruses generate protein-coding genes de novo, via the overprinting of a new reading frame onto an existing (“ancestral”) frame. This mechanism is thought to play an important role in viral pathogenicity, but has been poorly explored, perhaps because identifying the de novo frames is very challenging. Therefore, a new approach to detect them was needed. We assembled a reference set of overlapping genes for which we could reliably determine the ancestral frames, and found that their codon usage was significantly closer to that of the rest of the viral genome than the codon usage of de novo frames. Based on this observation, we designed a method that allowed the identification of de novo frames based on their codon usage with a very good specificity, but intermediate sensitivity. Using our method, we predicted that the Rex gene of deltaretroviruses has originated de novo by overprinting the Tax gene. Intriguingly, several genes in the same genomic region have also originated de novo and encode proteins that regulate the functions of Tax. Such “gene nurseries” may be common in viral genomes. Finally, our results confirm that the genomic GC content is not the only determinant of codon usage in viruses and suggest that a constraint linked to translation must influence codon usage.

Highlights

  • Modification of existing genes, such as by duplication or fusion, is a common and well-understood mechanism by which proteincoding genes originate [1,2]

  • The 27 overlaps come from 25 genera, distributed in 16 viral families covering a wide range of viruses (Table 1). 18 overlaps involve one gene being completely overlapped by the other, while in 9 the overlap is partial (e.g. Figure 2)

  • To be confident about the taxonomic distribution of each frame, we carried out extensive searches involving the most up to date

Read more

Summary

Introduction

Modification of existing genes, such as by duplication or fusion, is a common and well-understood mechanism by which proteincoding genes originate [1,2]. Studying de novo proteins should greatly enhance our understanding of host-pathogen co-evolution and our knowledge of the function and structure of viral proteins [3,10,11,12,13,14]. Finding that a viral protein has no detectable sequence homolog does not reliably indicate that it has originated de novo, because viral proteins evolve so fast that they can diverge in sequence beyond recognition. To circumvent this problem, in our previous work [3,4] and in the current study, we focused on a special case of de novo proteins: those generated by overprinting. Because overlapping genes are abundant in viruses [15,16,17], they constitute a rich source of de novo proteins

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call