Compressed Matching in Dictionaries

Shmuel T Klein,Dana Shapira

doi:10.3390/a4010061

Abstract

The problem of compressed pattern matching, which has recently been treated in many papers dealing with free text, is extended to structured files, specifically to dictionaries, which appear in any full-text retrieval system. The prefix-omission method is combined with Huffman coding and a new variant based on Fibonacci codes is presented. Experimental results suggest that the new methods are often preferable to earlier ones, in particular for small files which are typical for dictionaries, since these are usually kept in small chunks.

Highlights

The problem of Compressed Pattern Matching, introduced by Amir and Benson [1], is of performing pattern matching directly in a compressed text without any decompressing
For a given text T, pattern P and complementing encoding and decoding functions E and D, respectively, our aim is to search for E(P ) in E(T ), rather than the usual approach which searches for the pattern P in the decompressed text D(E(T ))
The experiments were performed on small prefix omission method (POM) files of several K bytes because of the following particular application: POM is often used to store dictionaries in B-trees; since the B-tree structure supports an efficient access to memory pages, each node is limited to a page size, and each page has to be compressed on its own, that is, for the first entry of each page, `1 = 0

Summary

Introduction

The problem of Compressed Pattern Matching, introduced by Amir and Benson [1], is of performing pattern matching directly in a compressed text without any decompressing. Most research efforts in compressed matching were invested in what could be called “classical” texts. These are texts written generally in some natural language, and which have been compressed by one of a variety of known compression techniques, such as Huffman coding [3] or various variants of the Lempel. Algorithms 2011, 4 and Ziv (LZ) methods, including LZW [4,5,6], gzip, DoubleSpace and many others [7,8,9] Note, that these methods are of general purpose and not restricted to the compression of natural language texts only

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Mar 22, 2011
Citations: 35	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Compressed Matching in Dictionaries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

The order-preserving pattern matching problem in practice
Domenico Cantone ... M Oğuzhan Külekci
Discrete Applied Mathematics | VOL. 274
Domenico Cantone, et. al.Domenico Cantone ... M Oğuzhan Külekci
19 Nov 2018
Discrete Applied Mathematics | VOL. 274

Compression of concordances in full-text retrieval systems
Y Choueka ... A S Fraenkel
-
Y Choueka, et. al.Y Choueka ... A S Fraenkel
01 Jan 1987
01 Jan 1987

Design and implementation of a Chinese full-text retrieval system based on a probabilistic model
Xiangji Huang ... Aijun An
-
Xiangji Huang, et. al. Xiangji Huang ... Aijun An
19 Oct 1993
19 Oct 1993

Non-Binary Robust Universal Variable Length Codes
Shmuel T Klein ... Dana Shapira
-
Shmuel T Klein, et. al.Shmuel T Klein ... Dana Shapira
01 Mar 2020
01 Mar 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compressed Matching in Dictionaries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms