Chapter 4 - Huffman coding

Ida Mengyi Pu

doi:10.1016/b978-075066310-6/50007-6

Abstract

Huffman coding is a successful compression method used originally for text compression. In any text, some characters occur far more frequently than others. For example, in English text, the letters E, A, O, T are normally used much more frequently than J, Q, X. Huffman's idea is, instead of using a fixed-length code such as 8 bit extended ASCII or DBCDIC for each symbol, to represent a frequently occurring character in a source with a shorter codeword and to represent a less frequently occurring one with a longer codeword. Hence the total number of bits of this representation is significantly reduced for a source of symbols with different frequencies. The number of bits required is reduced for each symbol on average. Statistical models and heuristic approach give rise to celebrating static Huffman and Shannon–Fano algorithms. Huffman algorithms take a bottom-up approach while Shannon–Fano top-down. Implementation issues make the Huffman code more popular than Shannon–Fano's. Maintaining two tables may improve the efficiency of the Huffman encoding algorithm. However, Huffman codes can give bad compression performance when the alphabet is small and the probability distribution of a source is skewed. In this case, extending the small alphabet and encoding the source in small groups of symbols may improve the overall compression.

Full Text