Abstract

Nucleotide skew analysis is a versatile method to study the nucleotide composition of RNA/DNA molecules, in particular to reveal characteristic sequence signatures. For instance, skew analysis of the nucleotide bias of several viral RNA genomes indicated that it is enriched in the unpaired, single-stranded genome regions, thus creating an even more striking virus-specific signature. The comparison of skew graphs for many virus isolates or families is difficult, time-consuming, and nonquantitative. Here, we present a procedure for a more simple identification of similarities and dissimilarities between nucleotide skew data of coronavirus, flavivirus, picornavirus, and HIV-1 RNA genomes. Window and step sizes were normalized to correct for differences in length of the viral genome. Cumulative skew data are converted into pairwise Euclidean distance matrices, which can be presented as neighbor-joining trees. We present skew value trees for the four virus families and show that closely related viruses are placed in small clusters. Importantly, the skew value trees are similar to the trees constructed by a “classical” model of evolutionary nucleotide substitution. Thus, we conclude that the simple calculation of Euclidean distances between nucleotide skew data allows an easy and quantitative comparison of characteristic sequence signatures of virus genomes. These results indicate that the Euclidean distance analysis of nucleotide skew data forms a nice addition to the virology toolbox.

Highlights

  • Nucleotide skew analysis [1] provides a powerful tool to visualize compositional aspects of a DNA/RNA sequence

  • We demonstrated for a representative collection of RNA viruses that the skew plots can be interpreted as “nucleotide compositional signatures” of the viral genomes and that these characteristic signatures are more prominently observed in the single-stranded regions than that in the basepaired, double-stranded regions of a viral RNA genome [5]

  • We developed a simple mathematical addition to GenSkew analysis that converts skew data into a pairwise Euclidean distance matrix, which can be formatted by means of clustering into a neighbourjoining tree, facilitating the identification of putative relationships, e.g., between viral sequences. e key result of this study is that this Euclidean algorithm offers an easy and quantitative interpretation of nucleotide skew data of virus genomes. e construction of Euclidean distance trees based on skewed nucleotide compositions does not require a prior alignment of the sequences

Read more

Summary

Introduction

Nucleotide skew analysis [1] provides a powerful tool to visualize compositional aspects of a DNA/RNA sequence. We demonstrated that purine enrichment in the Zika virus RNA genome [6] is a general property of most but not all Flaviviridae and, surprisingly, prominently observed at the first position of the codons and not the silent 3rd codon position (unpublished results). It is, difficult, time-consuming, and nonquantitative to Computational and Mathematical Methods in Medicine compare different skew graphs with respect to similarities and dissimilarities. We demonstrate that skew distance trees and phylogenetic trees are surprisingly similar but not identical

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call