Abstract

Full-text indices are data structures that can be used to find any substring of a given string. Many full-text indices require space larger than the original string. In this paper, we introduce the canonical Huffman code to the wavelet tree of a string T [1… n]. Compared with Huffman code based wavelet tree, the memory space used to represent the shape of wavelet tree is not needed. In case of large alphabet, this part of memory is not negligible. The operations of wavelet tree are also simpler and more efficient due to the canonical Huffman code. Based on the resulting structure, the multi-key rank and select functions can be performed using at most nH 0 + ∣ Σ∣(lg lg n + lg n − lg ∣ Σ∣)+ O( nH 0) bits and in O( H 0) time for average cases, where H 0 is the zeroth order empirical entropy of T . In the end, we present an efficient construction algorithm for this index, which is on-line and linear.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call