Abstract

The problem of compressed pattern matching, which has recently been treated in many papers dealing with free text, is extended to structured files, specifically to dictionaries, which appear in any full-text retrieval system. The prefix-omission method is combined with Huffman coding and a new variant based on Fibonacci codes is presented. Experimental results suggest that the new methods are often preferable to earlier ones, in particular for small files which are typical for dictionaries, since these are usually kept in small chunks.

Highlights

  • The problem of Compressed Pattern Matching, introduced by Amir and Benson [1], is of performing pattern matching directly in a compressed text without any decompressing

  • For a given text T, pattern P and complementing encoding and decoding functions E and D, respectively, our aim is to search for E(P ) in E(T ), rather than the usual approach which searches for the pattern P in the decompressed text D(E(T ))

  • The experiments were performed on small prefix omission method (POM) files of several K bytes because of the following particular application: POM is often used to store dictionaries in B-trees; since the B-tree structure supports an efficient access to memory pages, each node is limited to a page size, and each page has to be compressed on its own, that is, for the first entry of each page, `1 = 0

Read more

Summary

Introduction

The problem of Compressed Pattern Matching, introduced by Amir and Benson [1], is of performing pattern matching directly in a compressed text without any decompressing. Most research efforts in compressed matching were invested in what could be called “classical” texts. These are texts written generally in some natural language, and which have been compressed by one of a variety of known compression techniques, such as Huffman coding [3] or various variants of the Lempel. Algorithms 2011, 4 and Ziv (LZ) methods, including LZW [4,5,6], gzip, DoubleSpace and many others [7,8,9] Note, that these methods are of general purpose and not restricted to the compression of natural language texts only

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.