Abstract

Summarizes some recent work in the field of universal data compression, both from a probabilistic and a deterministic point of view. In the probabilistic setting, the data x to be compressed is generated by a source from a certain class {P(x|/spl theta/),/spl theta//spl isin//spl Lambda/}. The capacity C of the channel from /spl theta/ to a is well known to be the minimum attainable redundancy of universal lossless codes w.r.t. this class, both in the minimax sense and in the Bayesian (maximin) sense. We show that C is essentially a lower bound also in a stronger sense, that is, for most sources in the class. This extends Rissanen's lower bound for parametric families. In the deterministic setting, an individual sequence is assumed without any statistical assumptions, and we seek a universal lossless encoder that is essentially no worse than the best encoder in a certain class of encoders, e.g., finite-state (FS) encoders, block encoders, etc. We characterize the the minimum attainable redundancy of universal coding w.r.t the best encoder in this class. For the class of FS encoders, we derive a bound that holds for most sequences in a certain sense, which is analogous to the classical probabilistic minimum description length (MDL) principle. The bound is attainable sequentially by a suitable arithmetic code, and therefore a deterministic significance of the predictive MDL concept is established. Finally, we extend this approach to lossy compression and obtain results in the same spirit for block-to-block and block-to-variable length codes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call