Abstract

This paper presents a Boyer–Moore-type algorithm for regular expression pattern matching, answering an open problem posed by Aho in 1980 (Pattern Matching in Strings, Academic Press, New York, 1980, p. 342). The new algorithm handles patterns specified by regular expressions—a generalization of the Boyer–Moore and Commentz-Walter algorithms. Like the Boyer–Moore and Commentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation algorithms are derived, and it is shown that the required shift functions can be precomputed from Commentz-Walter's d 1 and d 2 shift functions. In certain cases, the Boyer–Moore (respectively Commentz-Walter) algorithm has greatly outperformed the Knuth–Morris–Pratt (respectively Aho–Corasick) algorithm (as discussed by Watson in his Ph.D. Thesis, Eindhoven University of Technology, September 1995, and in: N. Ziviani, R. Baeza-Yates, K. Guimaraes (Eds.), Proc. Third South American Workshop on String Processing, International Informatics Series, vol. 4, Carleton University Press, Recife, Brazil, 1996, pp. 280–294). In testing, the algorithm presented in this paper also frequently outperforms the regular expression generalization of the Aho–Corasick algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call