Abstract

List update is a key step during the Burrows-Wheeler transform (BWT) compression. Previous work has shown that careful study of the list update step leads to better BWT compression. Surprisingly, the theoretical study of list update algorithms for compression has lagged behind its use in real practice. To be more precise, the standard model by Sleator and Tarjan for list update considers a 'linear cost-of-access' model while compression incurs a logarithmic cost of access, i.e. accessing item i in the list has cost Theta(i) in the standard model but Theta(log i) in compression applications. These models have been shown, in general, not to be equivalent. This paper has two contributions: (1) We give the first theoretical proof that the commonly used Move-To-Front (MTF) has good performance under the compression logarithmic cost-of-access model. This has long been known in practice but a formal proof under the logarithmic cost compression model was missing until now, (2) we further refine the online compression model to reflect its use under compression by applying the recently developed 'online algorithms with advice' model. This advice model was initially a purely theoretical construct in which the online algorithm has access to an all powerful oracle during the computation. We show that surprisingly, this seemingly unrealistic model can be used to produce better multi-pass compression algorithms. More precisely, we introduce an 'almost-online' list update algorithm, which we term BIB which results in a compression scheme which is superior to schemes using standard online algorithms, in particular those of MTF and TIMESTAMP. For example, for the files in the standard Canterbury Corpus, the compression ratio of the scheme that uses BIB is 33.66 on average, while the compression ratios for the schemes that use MTF and TIMESTAMP are respectively 34.25 and 36.30.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call