Entropy coding techniques

William A Pearlman,Amir Said

doi:10.1017/cbo9780511984655.005

Abstract

Introduction In the previous chapter, Chapter 3, we presented the theory of lossless coding and derived properties for optimality of uniquely decodable, prefix-free source codes. In particular, we showed that entropy is the absolute lower rate limit of a prefix-free code and presented tree and arithmetic structures that support prefix-free codes. In this chapter, we shall present coding methods that utilize these structures and whose rates approach the entropy limit. These methods are given the generic name of entropy coding . Huffman coding is one common form of entropy coding. Another is arithmetic coding and several adaptive, context-based enhancements are parts of several standard methods of data compression. Nowadays, lossless codes, whether close to optimal or not, are often called entropy codes. In addition to Huffman and arithmetic coding, we shall develop other important lossless coding methods, including run-length coding, Golomb coding, and Lempel–Ziv coding. Huffman codes The construction invented by Huffman [1] in 1952 yields the minimum length, prefix-free code for a source with a given set of probabilities. First, we shall motivate the construction discovered by Huffman. We consider only binary ( D = 2) codes in this chapter, since extensions to non-binary are usually obvious from the binary case and binary codes are predominant in practice and the literature. We have learned that a prefix-free code can be constructed, so that its average length is no more than 1 bit from the source entropy, which is the absolute lower limit.

Full Text