Abstract

The problem of determining the best achievable performance of arbitrary lossless compression algorithms is examined, when correlated side information is available at both the encoder and decoder. For arbitrary source-side information pairs, the conditional information density is shown to provide a sharp asymptotic lower bound for the description lengths achieved by an arbitrary sequence of compressors. This implies that for ergodic source-side information pairs, the conditional entropy rate is the best achievable asymptotic lower bound to the rate, not just in expectation but with probability one. Under appropriate mixing conditions, a central limit theorem and a law of the iterated logarithm are proved, describing the inevitable fluctuations of the second-order asymptotically best possible rate. An idealised version of Lempel-Ziv coding with side information is shown to be universally first- and second-order asymptotically optimal, under the same conditions. These results are in part based on a new almost-sure invariance principle for the conditional information density, which may be of independent interest.

Highlights

  • It is well-known that the presence of correlated side information can potentially offer dramatic benefits for data compression [1,2]

  • Important applications where such side information is naturally present include the compression of genomic data [3,4], file and software management [5,6], and image and video compression [7,8]

  • We consider an idealised version of a Lempel-Ziv compression algorithm, and we show that it can achieve asymptotically optimal first- and second-order performance, universally over a broad class of stationary and ergodic source-side information pairs ( X, Y )

Read more

Summary

Introduction

It is well-known that the presence of correlated side information can potentially offer dramatic benefits for data compression [1,2]. Theorem 2 states that for any jointly stationary and ergodic source-side information pair ( X, Y ), the best asymptotically achievable compression rate is H ( X |Y ) bits/symbol, with probability 1 This generalises Kieffer’s corresponding result [17] to the case of compression with side information. General conditions on the source-side information pair ( X, Y ), in Theorem 8 we show that the ideal description lengths, log Rn , can be well-approximated by the conditional information density − log P( X1n |Y1n ) Combining this with our earlier results on the conditional information density, in Corollary 1 and Theorem 9 we show that the compression rate of this scheme converges to H ( X |Y ), with probability 1, and that it is universally second-order optimal.

Preliminaries
The Conditional Information Density
First-Order Asymptotics
Finer Asymptotics
Asymptotics of the Conditional Information Density
Idealised LZ Compression with Side Information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call