Abstract

The border and parameterized border (p-border) arrays are data structures used in pattern matching applications for traditional strings from the constant alphabet Σ, and parameterized strings (p-strings) from the constant alphabet Σ and the parameter alphabet Π. In this work, we introduce the structural border array (s-border) as defined for an n-length structural string (s-string) T. The s-string is a p-string with the existence of symbol complements in some alphabet Γ. These different alphabets add to both the intricacies and capabilities of pattern matching. For example, the s-string can handle the Watson–Crick base pairings in biological sequences and thus, assists in applications that deal with efficient pattern matching between RNA strands that share similar secondary structures. Initially, we provide a construction that executes in O(n2) time to build the s-border array. The paper establishes theory to improve the result to O(n) by proving particular properties of the s-border data structure. This result is significant because of the generalization of the s-string, which is a step beyond the p-string. Using the same construction algorithm, we show how to modify the s-string alphabets to also construct the p-border and the traditional border arrays in linear time. The generality of the s-border construction algorithm motivates us to devise pattern matching algorithms for s-matching, p-matching, and traditional matching. Our pattern matching algorithms are ultimately used to address the p-match problem with run-length encoded strings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call