Abstract

In the previous chapters, we discussed problems involving an exact match of string patterns. We now turn to problems involving similar but not necessarily exact pattern matches. There are a number of similarity or distance measures, and many of them are special cases or generalizations of the Levenshtein metric. The problem of evaluating the measure of string similarity has numerous applications, including one arising in the study of the evolution of long molecules such as proteins. In this chapter, we focus on the problem of evaluating a longest common subsequence, which is expressively equivalent to the simple form of the Levenshtein distance. The Levenshtein distance is a metric that measures the similarity of two strings. In its simple form, the Levenshtein distance, D(x , y), between strings x and y is the minimum number of character insertions and/or deletions (indels) required to transform string x into string y. A commonly used generalization of the Levenshtein distance is the minimum cost of transforming x into y when the allowable operations are character insertion, deletion, and substitution, with costs δ(λ , σ), δ(σ, λ), and δ(σ1, σ2) , that are functions of the involved character(s). There are direct correspondences between the Levenshtein distance of two strings, the length of the shortest edit sequence from one string to the other, and the length of the longest common subsequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n — D)/2. We will focus on the problem of determining the length of an LCS and also on the related problem of recovering an LCS. Another related problem, which will be discussed in Chapter 6, is that of approximate string matching, in which it is desired to locate all positions within string y which begin an approximation to string x containing at most D errors (insertions or deletions).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.