Compressed Pattern Matching Research Articles

Compressed pattern matching is an emerging research area that addresses the following problem: Given a text file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in LZW compressed files. The work includes an extension of Amir et al.'s well-known almost-optimal algorithm. The original Amir et al.'s algorithm has been improved to search not only the first occurrence of the pattern but also all other occurrences. A faster implementation for so-called simple is also proposed. The work also includes a novel multiple-pattern matching algorithm using the Aho-Corasick algorithm. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the total length of all patterns, t is the size of the LZW trie, and r is the number of occurrences of the patterns. Extensive experiments have been conducted to test the performance of our algorithms and to compare with other well-known compressed pattern matching algorithms, particularly the BWT-based algorithms and another similar multiple-pattern matching algorithm by Kida et al. that also uses the Aho-Corasick algorithm on the LZW compressed data. The results showed that our multiple-pattern matching algorithm is competitive among the best compressed pattern-matching algorithms and is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. The proposed algorithm is efficient for large files and it is particularly efficient when being applied on archive search if the archives are compressed with a common LZW trie. LZW is one of the most efficient and popular compression algorithms used extensively and our method requires no modification on the compression algorithm. The work reported in this paper, therefore, has great economic and market potential.

Read full abstract

We consider the complexity of problems related to two-dimensional texts (2D-texts) described succinctly. In a succinct description, larger rectangular subtexts are defined in terms of smaller parts in a way similar to that of Lempel–Ziv compression for 1D-texts, or in strings with shortened descriptions as in ( Nordic J. Comput. 4 (1997) 172–186), or in hierarchical graphs described by context-free graph grammars. A given 2D-text T with many internal repetitions can have a hierarchical description which may be exponentially smaller and which can be given as an input for a pattern-matching algorithm which gives information about T. Such a hierarchical description is given in terms of a straight-line program, or SLP (see Nordic J. Comput. 4 (1997) 172–186) or, equivalently, of a 2D grammar. We consider compressed pattern matching, where the input consists of a 2D-pattern P and a hierarchical description of a 2D-text T, and fully compressed pattern matching, where the input consists of hierarchical descriptions of both the pattern P and the text T. For 1D strings, there exist polynomial-time deterministic algorithms for these problems for similar types of succinct text descriptions ( J. Comput. System Sci. 52 (2) (1996) 299–307; “Proceedings of the 27th Annual Symposium on the Theory of Computing,” pp. 703–712; “Proceedings of the 5th Scandinavian Workshop on Algorithm Theory,” Springer-Verlag, Berlin, 1996; Nordic J. Comput. 4(2) (1997), 172–186). We show that the complexity dramatically increases in a 2D setting. For example, compressed 2D-matching is NP-complete, fully compressed 2D-matching is Σ P 2-complete, and testing a given occurrence of a 2D compressed pattern is co- NP-complete. On the other hand, we give efficient algorithms for the related problems of randomized equality testing and testing for a given occurrence of an uncompressed pattern. We also show the surprising fact that the compressed size of a subrectangle of a compressed 2D array can grow exponentially, unlike the 1D case.

Read full abstract

Compressed Pattern Matching Research Articles

Related Topics

Articles published on Compressed Pattern Matching

An approach for fast compressed text matching and to avoid false matching using WBTC and wavelet tree

Fast Pattern Matching in Compressed Text using Wavelet Tree

A new word-based compression model allowing compressed pattern matching

A review on compressed pattern matching

Faster Fully Compressed Pattern Matching by Recompression

Linear Compressed Pattern Matching for Polynomial Rewriting (Extended Abstract)

Improving Parse Trees for Efficient Variable-to-Fixed Length Codes

Improving Parse Trees for Efficient Variable-to-Fixed Length Codes

Compressed Matching in Dictionaries

Pattern Matching in LZW Compressed Files

Adapting the Knuth–Morris–Pratt algorithm for pattern matching in Huffman encoded texts

Collage system: a unifying framework for compressed pattern matching

On the Complexity of Pattern Matching for Highly Compressed Two-Dimensional Texts

Compressed and fully compressed pattern matching in one and two dimensions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Compressed Pattern Matching Research Articles

Related Topics

Articles published on Compressed Pattern Matching

An approach for fast compressed text matching and to avoid false matching using WBTC and wavelet tree

Fast Pattern Matching in Compressed Text using Wavelet Tree

A new word-based compression model allowing compressed pattern matching

A review on compressed pattern matching

Faster Fully Compressed Pattern Matching by Recompression

Linear Compressed Pattern Matching for Polynomial Rewriting (Extended Abstract)

Improving Parse Trees for Efficient Variable-to-Fixed Length Codes

Improving Parse Trees for Efficient Variable-to-Fixed Length Codes

Compressed Matching in Dictionaries

Pattern Matching in LZW Compressed Files

Adapting the Knuth–Morris–Pratt algorithm for pattern matching in Huffman encoded texts

Collage system: a unifying framework for compressed pattern matching

On the Complexity of Pattern Matching for Highly Compressed Two-Dimensional Texts

Compressed and fully compressed pattern matching in one and two dimensions