Abstract

Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in ℝ18. By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in ℝ18. The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method.

Highlights

  • With the rapid development of Generation Sequencing technology, more and more information of the genome sequences is available

  • We propose an Accumulated Natural Vector approach, which projects each sequence into a point in R18, where the additional six dimensions describe the covariance between nucleotides

  • The results of Accumulated Natural Vector are more accurate and the calculation cost is very small compared to others

Read more

Summary

Introduction

With the rapid development of Generation Sequencing technology, more and more information of the genome sequences is available. Published alignment-free methods include Markov chain models (Apostolico and Denas, 2008), chaos theory (Hatje and Kollmar, 2012), and some other methods based on the statistics of oligomer frequency and associated with a fixed length segment, known as k-mer (Sims et al, 2009). Yau and his team proposed the natural vector method, which takes the position of each nucleotide into consideration. We propose a new Accumulated Natural Vector and Covariance

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call