Abstract
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern \(P[1\mathinner{.. }m]\) on a large repetitive text collection \(T[1\mathinner{.. }n]\) over an alphabet of size \(\sigma\) , which is represented as a (hopefully much smaller) run-length context-free grammar of size \(g_{rl}\) . We show that the problem can be solved in time \(O(m^{2}\log^{\epsilon}n)\) , for any constant \(\epsilon\gt0\) , on a data structure of size \(O(g_{rl})\) . Further, on a locally consistent grammar of size \(O(\delta\log\frac{n\log\sigma}{\delta\log n})\) , the time decreases to \(O(m\log m(\log m+\log^{\epsilon}n))\) . The value \(\delta\) is a function of the substring complexity of \(T\) and \(\Omega(\delta\log\frac{n\log\sigma}{\delta\log n})\) is a tight lower bound on the compressibility of repetitive texts \(T\) , so our structure has optimal size in terms of \(n\) , \(\sigma\) , and \(\delta\) . We extend our results to several related problems, such as finding \(k\) -MEMs, MUMs, rare MEMs, and applications. Categories and Subject Descriptors: E.1 [Data structures] ; E.2 [Data storage representations] ; E.4 [Coding and information theory]: Data compaction and compression; F.2.2 [Analysis of algorithms and problem complexity] : Nonnumerical algorithms and problems— Pattern matching, Computations on discrete structures, Sorting and searching
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.