We study the online Parameterized Dictionary Matching with One Gap problem (PDMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text T arriving online, a character at a time, we can report all the patterns from D that parameterized match to suffixes of the text that has arrived so far, before the next character arrives. Two equal-length strings are a parameterized match if there exists a bijection on the alphabets, such that one string matches the other under the bijection. The gap symbols are associated with bounds determining the possible lengths of matching strings. Online Dictionary Matching with One Gap (DMOG) captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap. Parameterized match captures possible encryption of the patterns. We also define the strict PDMOG problem, in which subpatterns of the same dictionary pattern should be parameterized matched via the same bijection. This captures situations where subpatterns of a dictionary pattern are encoded simultaneously. We study this problem for special case called alphabet-saturated dictionairy, where every subpattern contains all characters of the dictionary alphabet Σ. We use the following parameters to describe our results: D is the total size of the dictionary (not including the gaps), plsc is the longest parameterized suffix chain of subpatterns in D, op is the number of parameterized patterns occurrences in T, α⁎ and β⁎ are the minimum left and maximum right gap borders in the non-uniformly bounded dictionary case, δ(GD) is the degeneracy of the graph GD representing dictionary D. This graph is classified as sparse or dense according the value of the δ(GD) and plsc parameters. We obtain:–O˜(D) preprocessing time/space and O˜(δ(GD)⋅plsc+plsc⋅max{|Σ|,M}+op) query time per text character algorithm for online PDMOG with sparse graph dictionaries.–O˜(D+d(β⁎−α⁎)) preprocessing time/space and O˜(plsc⋅d⋅(β⁎−α⁎)+plsc⋅max{|Σ|,M}+op) query time per text character algorithm for online PDMOG with dense graph dictionaries.–O˜(D) preprocessing time/space and O˜(δ(GD)⋅plsc+op) query time per text character algorithm for strict PDMOG with alphabet-saturated dictionaries.These results are parallel to the ones obtained for the Dictionary with One Gap (DMOG) problem almost matching the lower bounds achieved for this problem [7]. While the parameter δ(GD) can be as large as d and much lager if the dictionary has non-uniform gap boundaries, and the parameter plsc could theoretically be as large as d, in many practical situations these parameters are actually small. The strength of our work is in achieving results that explore and exploit small values for these parameters, thus supplying algorithms that are suitable for some practical cyber security needs.
Read full abstract