Effects of Suffix Repetition Rates of a String on the Performance of String Matching Algorithms

Yang Wang

doi:10.1109/icis.2009.29

Abstract

The highly efficient Boyer-Moore's string matching algorithm utilizes information on multi-occurrences of string suffixes in a pattern string to avoid backtracks in searching the pattern string. One hypothesis is that Boyer-Moore's algorithm even benefits more from highly self-repetitive patterns. In this paper, the author studies how multi-occurrences of string suffixes affect the performance of the Boyer-Moore's algorithm as well as some other well known string search algorithms. The paper introduces a new concept of suffix repetition rate (SRR) to measure how frequently the suffixes of a string occur inside of the string. Using this measurement, experiments with several thousands patterns over the entire range of SRRs have been carried out, and the results show that increasing of SRR on pattern strings does not improve the performance of a searching algorithm in terms of efficiency.

Full Text