Abstract

We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size n compressing a string of size N and a pattern string of size m over an alphabet of size $$\sigma $$ , our algorithm uses $$O(n+\frac{n\sigma }{w})$$ space and $$O(n+\frac{n\sigma }{w}+m\log N\log w\cdot occ)$$ or $$O(n+\frac{n\sigma }{w}\log w+m\log N\cdot occ)$$ time. Here w is the word size and occ is the number of minimal occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for $$occ=o(\frac{n}{\log N})$$ occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.