Enhancing arithmetic and tree-based coding

M.A Bassiouni,Amar Mukherjee

doi:10.1016/0306-4573(89)90046-0

Abstract

Coding methods like the Huffman and the arithmetic coding utilize the skewness of character distribution, i.e. the distributional properties of data. Huffman's code assigns shorter codes to more frequently occurring characters to obtain compression while the arithmetic coding assigns larger intervals (code ranges) to characters having higher probabilities of occurrence. In this article, we present a scheme that enhances the model of both the Huffman and arithmetic coding by utilizing the locality of character reference, i.e. the tendency of consecutive characters to fall within the same type (e.g. alphabets, digits, trailing blanks, successive zeros). The basic idea is to split the character set into different groups based on the locality of character reference behavior and derive a new probability for each character that corresponds to the frequency of occurrence of that character within its group. This modification can be applied to both of the coding methods to improve the compression efficiency. The article is concluded by a discussion of hardware assistance for data encoding. Successful implementation of compression chips would be a significant enhancement to the technology of data encoding and would greatly contribute to reducing the cost of data transmission and data access within information processing machines and distributed information systems.

Full Text