Modeling by MDL criterion for adaptive data compression

Hidetoshi Yokoo

doi:10.1002/ecja.4410710201

Abstract

AbstractThis paper proposes a high‐performance noiseless data compression model which can compress strings generated by a binary Markov information source without prior knowledge of the source. The adaptive expression of the existing universal codes is described by a model with separated sampler and encoder, and the learning structure of each code and the encoding process are described. Through such a discussion, the intuitive reasons are clarified as to why the codes, despite their asymptotic optimality, do not compress well at the start, pointing out room for improvement. Then, the MDL criterion is introduced into each encoder, to improve the performance, while keeping the sampler fixed. The MDL criterion is an estimation for the minimum code length, including the representation cost of the parameters. Based on this criterion, the Markov order which is to be assumed for each symbol is adaptively determined. Since the same determination can be made at the time of decoding, the unique decodability is guaranteed. In the second half of this paper, more detailed techniques are discussed, and a remarkable improvement is demonstrated by a computer simulation. It is also shown that the sampler of the proposed universal code is of a high‐performance, realizing two kinds of encoders.

Full Text