Fastest Pattern Matching in Strings

L Colussi

doi:10.1006/jagm.1994.1008

Abstract

An algorithm is presented that substantially improves the algorithm of Boyer and Moore for pattern matching in strings, both in the worst case and in the average. Both the Boyer and Moore algorithm and the new algorithm assume that the characters in the pattern and in the text are taken from a given alphabet Σ of finite size. The new algorithm performs 2 n character comparisons in the worst case while the Boyer and Moore algorithm requires 3 n comparisons [4]; the new algorithm requires fewer comparisons than Boyer and Moore on the average (but for the case of a binary alphabet, where the two algorithms perform roughly the same). For large patterns the ratio between the average number of comparisons for Boyer and Moore algorithm and the average number of comparisons for the new algorithm is close to the size |Σ| of the alphabet. As a shortcoming of the new algorithm, the preprocessing of the pattern requires O( m) time on the average but O( m 2) in the worst case. A mixed strategy between the two algorithms is suggested in order to make the preprocessing linear, at the expense of a slightly less efficient performance of the algorithm. The new algorithm has been obtained from the Boyer and Moore algorithm using a correctness proof (in the style of Hoare′s axiomatic semantics) as a tool to improve algorithms.

Full Text