Abstract
A gap-pattern is a sequence of sub-patterns separated by bounded sequences of don’t care characters (called gaps). A one-gap-pattern is a pattern of the form \(P[\alpha ,\beta ]Q\), where \(P\) and \(Q\) are strings drawn from alphabet \(\varSigma \) and \([\alpha , \beta ]\) are lower and upper bounds on the gap size \(g\). The gap size \(g\) is the number of don’t care characters between \(P\) and \(Q\). The dictionary matching problem with one-gap is to index a collection of one-gap-patterns, so as to identify all sub-strings of a query text \(T\) that match with any one-gap-pattern in the collection. Let \({\mathcal D}\) be such a collection of \(d\) patterns, where \({\mathcal D}=\{P_i[\alpha _i,\beta _i]Q_i\mid 1\le i \le d\}\). Let \(n=\sum _{i=1}^d|P_i|+|Q_i|\). Let \(\gamma \) and \(\lambda \) be two parameters defined on \({\mathcal D}\) as follows: \(\gamma = |\{j\mid j \in [\alpha _i,\beta _i], 1\le i\le d\}|\) and \(\lambda = |\{\alpha _i,\beta _i \mid 1\le i\le d\}|\). Specifically \(\gamma \) is the total number gap lengths possible over all patterns in \({\mathcal D}\) and \(\lambda \) is the number of distinct gap boundaries across all the patterns. We present a linear space solution (i.e., \(O(n)\) words) for answering a dictionary matching query on \({\mathcal D}\) in time \(O(|T| \gamma \log \lambda \log d+occ)\), where \(occ\) is the output size. The query time can be improved to \(O(|T|\gamma +occ)\) using \(O(n+d^{1+\epsilon })\) space, where \(\epsilon >0\) is an arbitrarily small constant. Additionally, we show a compact/succinct space index offering a space-time trade-off. In the special case where parameters \(\alpha _i\) and \(\beta _i\)’s for all the patterns are same, our results improve upon the work by Amir et al. [CPM, 2014]. We also explore several related cases where gaps can occur at arbitrary locations and where gap can be induced in the text rather than pattern.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.