Abstract

We consider a sliding window W over a stream of characters from some alphabet of constant size. We want to look up a pattern in the current sliding window content and obtain all positions of the matches. We present an indexed version of the sliding window, based on a suffix tree. The data structure of size Θ(|W|) has optimal time queries Θ(m+occ) and amortized constant time updates, where m is the length of the query string and occ is its number of occurrences.

Highlights

  • Introduction and Related WorkText indexing, pattern matching, and big data in general is a well studied field of computer science and engineering

  • One way of implementing string matching including regular expressions is by using finite automata [4,5]

  • String matching is used in digital forensics, where we typically match multiple regular expressions on massive amounts of data, which involves multiple streams and parallelism

Read more

Summary

Introduction and Related Work

Pattern matching, and big data in general is a well studied field of computer science and engineering. A practical suffix array based a sliding window was proposed by Ferreira et al [22,23], with speed improvements by Salson et al [24] Their approach supports efficient substring query operations, but updating the suffix array requires at least linear time due to the nature of the array, i.e., insertion and/or removal of an element requires the other elements to shift by one slot. It turns out that this operation is not trivial, due to details hidden in Ukkonen’s suffix tree construction algorithm This is the first data structure for on-the-fly text indexing which requires amortized O(1) time for updates and worst case optimal time for queries.

Notation and Preliminaries
Suffix Tree
Ukkonen’s Online Suffix Tree Construction Algorithm
Sliding Suffix Tree
Queries
Maintenance
Conclusions and Open Problems
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.