Abstract

We propose a storage scheme for a string S [ 1 , n ] , drawn from an alphabet Σ , that requires space close to the k -th order empirical entropy of S , and allows one to retrieve any substring of S of length ℓ in optimal O ( 1 + ℓ log | Σ | n ) time. This matches the best known bounds [R. González, G. Navarro, Statistical encoding of succinct data structures, in: Procs CPM, in: LNCS, vol. 4009, 2006, pp. 295–306; K. Sadakane, R. Grossi, Squeezing succinct data structures into entropy bounds, in: Procs ACM-SIAM SODA, 2006, pp. 1230–1239], via the use of binary encodings and tables only. We also apply our storage scheme to the Burrows–Wheeler Transform [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994], and achieve a space bound which depends on both the k -th order entropy of S and the k -th order entropy of its BW-transformed string bwt ( S ) .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call