In the approximate string matching problem, differences are allowed between the pattern string P and each of its occurrences in the text string T, and one is interested in finding all the occurrences of P in T with at most k differences. We consider here weighted differences (errors) between P and T and develop fast sequential and parallel algorithms. In particular, we allow the following types of errors: mismatch whose weight depends on the mismatching characters, extra character with constant weight, missing character with constant weight, and transposition of two consecutive characters with constant weight. A set of theoretical results allows to extend known algorithms to solve this problem with O( kn) sequential time and O( k + log m) parallel time on a 4PRAM model with max{ n + k + 1, m p2} processors, where k is the maximum sum of the error weights, n is the length of T, and m is the length of P.
Read full abstract