Background: The multiple sequence alignment (MSA) algorithms are the traditional ways to compare and analyze DNA sequences. However, for large DNA sequences, these algorithms require a long time computationally. Objective: Here we will propose a new numerical method to characterize and compare DNA sequences quickly. Method: Based on a new 2-dimensional (2D) graphical representation of DNA sequences, we can obtain an 8-dimensional vector using two basic concepts of probability, the mean and the variance. Results: We perform similarity/dissimilarity analyses among two real DNA data sets, the coding sequences of the first exon of beta-globin gene of 11 species and 31 mammalian mitochondrial genomes, respectively. Conclusion: Our results are in agreement with the existing analyses in our literatures. We also compare our approach with other methods and find that ours is more effective.
Read full abstract