Abstract

The problem of generalized function matching can be defined as follows: given a pattern p = p1 ⋯ pm and a text t = t1 ⋯ tn, find a mapping f : ∑p→∑t⁎; and all text locations i such that f(p1)f(p2) ⋯ f(pm)=ti ⋯ tj, a substring of t.By modifying the restrictions of the matching function f, one can obtain different matching problems, many of which have important applications. When f : ∑p → ∑t we are faced with problems found in the well-established field of combinatorial pattern matching. If the single character constraint is lifted and f : ∑p→∑t⁎ we obtain generalized function matching as introduced by Amir and Nor (JDA 2007). If we further constrain f to be injective, then we arrive at generalized parametrized matching as defined by Clifford etal. (SPIRE 2009).There are a number of important applications for pattern matching in computational biology, text editors and data compression, to name a few. Therefore, many efficient algorithms have been developed for a wide variety of specific problems including finding tandem repeats in DNA sequences, optimizing embedded systems by reusing code etc.In this work we present a heuristic algorithm illustrating a practical approach to tackling a variant of generalized function matching where f : ∑p → ∑t+ and demonstrate its performance on human-produced text as well as random strings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call