Abstract

A multi-track string is a tuple of strings of the same length. Given the pattern and text of two multi-track strings, the permuted pattern matching problem is to find the occurrence positions of all permutations of the pattern in the text. In this paper, we propose several algorithms for permuted pattern matching. Our first algorithm, which is based on the Knuth–Morris–Pratt (KMP) algorithm, has a fast theoretical computing time with O ( m k ) as the preprocessing time and O ( n k log σ ) as the matching time, where n, m, k, σ , and occ denote the length of the text, the length of the pattern, the number of strings in the multi-track, the alphabet size, and the number of occurrences of the pattern, respectively. We then improve the KMP-based algorithm by using an automaton, which has a better experimental running time. The next proposed algorithms are based on the Boyer–Moore algorithm and the Horspool algorithm that try to perform pattern matching. These algorithms are the fastest experimental algorithms. Furthermore, we propose an extension of the AC-automaton algorithm that can solve dictionary matching on multi-tracks, which is a task to find multiple multi-track patterns in a multi-track text. Finally, we propose filtering algorithms that can perform permuted pattern matching quickly in practice.

Highlights

  • The pattern matching problem on strings is to find all occurrence positions of a pattern string in a text string

  • We proposed some algorithms for permuted pattern matching on multi-track strings

  • We primarily focused on the algorithms that preprocess the pattern before performing permuted pattern matching, instead of constructing indexing structures from the text

Read more

Summary

Introduction

The pattern matching problem on strings is to find all occurrence positions of a pattern string in a text string. The permuted pattern matching problem is as follows: given two multi-track strings T = The problem can be solved by constructing indexing structures from the text such as multi-track suffix trees [7], multi-track position heaps [8], and filtering multi-set trees [9], or by preprocessing the pattern such as the AC-automaton-based algorithm [7]. We propose several algorithms that solve permuted pattern matching quickly. These algorithms can be classified into three groups. Preliminary versions of this work appeared in [10,11]

Notation and Definition on Multi-Track Strings
KMP-Based Permuted Pattern Matching Algorithms
Multi-Track KMP Algorithm
Multi-Track AC-Automaton
Multi-Track Permuted Matching Automaton
Filtering Algorithm on a Multi-Track String
Experiments
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call