Weighted forward looking adaptive coding

Aharon Fruchtman,Yoav Gross,Shmuel T Klein,Dana Shapira

doi:10.1016/j.tcs.2022.07.013

Abstract

Huffman coding is known to be optimal under certain constraints, yet its dynamic version, which constantly alters the Huffman tree as a function of the already processed characters, may be even more efficient in practice. A new forward looking variant of Huffman compression has been proposed recently, that provably always performs better than static Huffman coding by at least m−1 bits, where m denotes the size of the alphabet, and has a better worst case size than the standard dynamic Huffman coding. This paper introduces a new generic coding method, extending the known static and dynamic variants and including them as special cases. In fact, the generalization is applicable to all statistical methods, including arithmetic coding. This leads then to the formalization of a new double-pass coding method that is adaptive in the sense that it uses changing statistics depending on the current position within the processed file, yet it behaves like static coding, as it assumes the knowledge of the distribution in the entire file; this is contrary to online variants that rely only on the text seen so far and adapt the model dynamically. We call the new method positional coding, and its compression performance, using global statistics, is provably always at least as good as that of the best dynamic variants known to date. Moreover, we present empirical results that show improvements by positional coding and its extensions over static and dynamic Huffman and arithmetic coding, even when the encoded file includes the model description.

Full Text