Abstract

The DNA graphical representation which firstly arranges each nucleotide to a numerical vector and then concatenates the vectors into a zigzag curve provides a powerful tool for analyzing genetic sequences. Here we introduce two common geometric quantities, area and curvature, to DNA sequence analysis based on the graphical representation method. Then we derive a higher dimensional vector corresponding to a DNA sequence by the permutation of arrangements between the nucleotides and the representation vectors. After obtaining the geometric descriptors of the sequence, we can perform similarity/dissimilarity analysis of multiple sequences simultaneously. Our approach was examined by three real biological data sets: the coding sequences of beta globin gene of animals, ribulose bisphosphate carboxylase small chain gene of plants, and mitochondrial genome sequences of mammals. The results by our computational approach are consistent with those obtained by other existing methods. Moreover, our approach for computing the descriptors has complexity of linear growth in the length of sequence, which deduces that the algorithm is much faster than traditional multiple sequence alignments. These results indicate that our work has provided a powerful tool to accurately and quickly specify the similarity of two DNA sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call