Abstract

Let $\pi'_{w}$ denote the failure function of the Knuth-Morris-Pratt algorithm for a word w. In this paper we study the following problem: given an integer array $A'[1 \mathinner {\ldotp \ldotp }n]$ , is there a word w over an arbitrary alphabet Σ such that $A'[i]=\pi'_{w}[i]$ for all i? Moreover, what is the minimum cardinality of Σ required? We give an elementary and self-contained $\mathcal{O}(n\log n)$ time algorithm for this problem, thus improving the previously known solution (Duval et al. in Conference in honor of Donald E. Knuth, 2007), which had no polynomial time bound. Using both deeper combinatorial insight into the structure of π′ and advanced algorithmic tools, we further improve the running time to $\mathcal{O}(n)$ .

Highlights

  • 1.1 Pattern Recognition and Failure FunctionsThe pattern matching algorithms attracted much attention since the dawn of computer science

  • Validation of border arrays is used by algorithms generating all valid border arrays [9, 11, 20]

  • Related Results The study of validating arrays related to string algorithms and word combinatorics was started by Franek et al [11], who gave an offline linear algorithm for border array validation

Read more

Summary

Pattern Recognition and Failure Functions

The pattern matching algorithms attracted much attention since the dawn of computer science It was interesting, whether a linear-time algorithm for this. The first fully linear time pattern matching algorithm is the Morris-Pratt algorithm [21], which is designed for the RAM machine model, and is well known for its beautiful concept. It simulates the minimal DFA recognizing Σ∗p (p denotes the pattern) by using a failure function πp, known as the border array. The Knuth-Morris-Pratt algorithm [17] improves it by using an optimised failure function, namely the strict border array π (or strong failure function). Even Simon’s algorithm (i.e., the very first improvement) deals with periods of pattern prefixes augmented by a single text symbol rather than pure periods of pattern prefixes

Strict Border Array Validation
Preliminaries
Border Array Validation
5: ADJUST-LAST-SLOPE
Overview of the Algorithm
Details and Correctness
Performing Pin Value Checks
Performing Consistency Checks
Size of the Alphabet
Improving the Running Time to Linear
Suffix Trees for Polylogarithmic Alphabet
Compressing A
Performing Short Consistency Checks
3: End while
Remarks and Open Problems
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call