Approximate String Matching with Swap and Mismatch

Ohad Lipsky,B Riva Shalom,Elly Porat,Asaf Tzur,Benny Porat

doi:10.1007/978-3-540-77120-3_75

Abstract

Finding the similarity between two sequences is a major problem in computer science. It is motivated by many issues from computational biology as well as from information retrieval and image processing. These fields take into account possible corruptions of the data caused by genome rearrangements, typing mistakes, and more. Therefore, many applications do not require merely complete resemblance of the sequences, but rather an approximated matching. We consider mismatches and swaps as natural mistakes which are allowed in a meagre number. The edit distance problem with swap and mismatch operations was discussed by Amir et. al. [3]. They solved the problem in \(O(n\sqrt{m}\log m)\) time. From then on the problem of string matching with at most k swaps and mismatches errors was open.In this paper we present an algorithm that finds all locations where the pattern has at most k mismatch and swap errors in time \(O(n\sqrt{k\log m})\).KeywordsEdit DistanceString MatchText SegmentText LocationMismatch ErrorThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text