Canonical Huffman code based full-text index

Yi Zhang,Zhili Pei,Jinhui Yang,Yanchun Liang

doi:10.1016/j.pnsc.2007.11.001

Yi Zhang, Zhili Pei + Show 2 more

Open Access

https://doi.org/10.1016/j.pnsc.2007.11.001

Copy DOI

Abstract

Full-text indices are data structures that can be used to find any substring of a given string. Many full-text indices require space larger than the original string. In this paper, we introduce the canonical Huffman code to the wavelet tree of a string T [1… n]. Compared with Huffman code based wavelet tree, the memory space used to represent the shape of wavelet tree is not needed. In case of large alphabet, this part of memory is not negligible. The operations of wavelet tree are also simpler and more efficient due to the canonical Huffman code. Based on the resulting structure, the multi-key rank and select functions can be performed using at most nH 0 + ∣ Σ∣(lg lg n + lg n − lg ∣ Σ∣)+ O( nH 0) bits and in O( H 0) time for average cases, where H 0 is the zeroth order empirical entropy of T . In the end, we present an efficient construction algorithm for this index, which is on-line and linear.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Progress in Natural Science	Publication Date: Jan 28, 2008
Citations: 3	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Canonical Huffman code based full-text index

Abstract

Talk to us

Similar Papers

More From: Progress in Natural Science

Lead the way for us

Similar Papers

A Compressed Format Index Based on the Wavelet Tree and Its Implement
Zhang Yi ... Lu Yan
-
Zhang Yi, et. al.Zhang Yi ... Lu Yan
01 Oct 2010
01 Oct 2010

The myriad virtues of Wavelet Trees
Paolo Ferragina ... Giovanni Manzini
Information and Computation | VOL. 207
Paolo Ferragina, et. al.Paolo Ferragina ... Giovanni Manzini
29 Jan 2009
Information and Computation | VOL. 207

A hardware design method for Canonical Huffman Code
Yi Chen ... Zi Wei Xia
-
Yi Chen, et. al.Yi Chen ... Zi Wei Xia
01 Nov 2017
01 Nov 2017

An Improved Image Compression Algorithm Using 2D DWT and PCA with Canonical Huffman Encoding.
Rajiv Ranjan ... Prabhat Kumar
Entropy | VOL. 25
Rajiv Ranjan, et. al.Rajiv Ranjan ... Prabhat Kumar
25 Sep 2023
Entropy | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Canonical Huffman code based full-text index

Abstract

Talk to us

Similar Papers

More From: Progress in Natural Science