Phylogeny reconstruction based on the length distribution of k-mismatch common substrings

Burkhard Morgenstern,Svenja Schöbel,Chris-André Leimeister

doi:10.1186/s13015-017-0118-8

Burkhard Morgenstern, Svenja Schöbel + Show 1 more

Open Access

https://doi.org/10.1186/s13015-017-0118-8

Copy DOI

Journal: Algorithms for Molecular Biology	Publication Date: Dec 1, 2017
Citations: 19	License type: open-access

Affiliation: University of Göttingen

Abstract

BackgroundVarious approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487–1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings.ResultsIn this paper, we study the length distribution of k-mismatch common substrings between two sequences. We show that the number of substitutions per position can be accurately estimated from the position of a local maximum in the length distribution of their k-mismatch common substrings.

Highlights

Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences
Other approaches are based on the matching statistics [10], that is on the length of common substrings of the input sequences [11, 12]
Since there is no exact solution to the k-mismatch longest common substring problem that is fast enough to be applied to long genomic sequences, we proposed a simple heuristic: we first search for longest exact matches and extend these matches until the k + 1st mismatch occurs

Summary

Introduction

Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Distances are calculated from the average length of these k-mismatch common substrings as in ACS; the implementation of this approach is called kmacs.

Objectives

Results

Conclusion