Abstract

Distinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called synonymous dinucleotide usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way.

Highlights

  • Certain dinucleotides, two nucleotides adjacent in a sequence, are known to be over- or under-represented in the genomes of living organisms, creating distinct compositional patterns [1].Organisms with methylated genomes such as vertebrates and plants have low levels of CpG dinucleotides

  • In this paper we propose synonymous dinucleotide usage (SDU) as a novel method for quantifying dinucleotide representation In this paper we propose SDU as a novel method for quantifying dinucleotide representation in in a coding sequence by comparing the observed frequency of a synonymous dinucleotide to that a coding sequence by comparing the observed frequency of a synonymous dinucleotide to that expected under the null hypothesis of equal synonymous codon usage

  • We further extend this metric expected under the null hypothesis of equal synonymous codon usage

Read more

Summary

Introduction

Two nucleotides adjacent in a sequence, are known to be over- or under-represented in the genomes of living organisms, creating distinct compositional patterns [1]. Organisms with methylated genomes such as vertebrates and plants have low levels of CpG dinucleotides. This is not the case for methylase-absent organisms like invertebrates, bacteria and fungi [2,3]. UpA deprivation is consistently present in most living organisms, including prokaryotes. This bias is suspected to be due to UpA-rich mRNA being unstable and more prone to degradation by cytoplasmic

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.