Abstract

Given a sequence S of n symbols over some alphabet Σ of size σ, we develop new compression methods that are (i) very simple to implement; (ii) provide O(1) time random access to any symbol (or short substring) of the original sequence. Our simplest solution uses at most 2h+o(h) bits of space, where h = n(H$_{0}$(S)+1), and H$_{0}$(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. For example, we can achieve n(H$_{k}$(S)+1)+o(n(H$_{k}$(S)+1)) bits of space, for k = o(log$_{σ}$(n)). Several applications are discussed, including text compression, (compressed) full-text indexing and string matching.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call