Abstract

Alignment-free methods are increasingly used to estimate distances between DNA and protein sequences and to reconstruct phylogenetic trees. Most distance functions used by these methods, however, are heuristic measures of dissimilarity, not based on any explicit model of evolution. Herein, we propose a simple estimator of the evolutionary distance between two DNA sequences calculated from the number of (spaced) word matches between them. We show that this distance function estimates the evolutionary distance between DNA sequences more accurately than other distance measures used by alignment-free methods. In addition, we calculate the variance of the number of (spaced) word matches depending on sequence length and mismatch probability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call