Abstract

When facing data from a “huge” alphabet, one may not be able to apply the previous results with satisfying theoretical guarantees, especially when those results are asymptotic. By a “huge alphabet”, we mean for instance that within the data, some letters may not have occurred yet. To understand how to cope with such situations, we will be interested in the case where the alphabet is infinite. In a finite alphabet, we have seen that there exist universal codes over the class of stationary ergodic sources. For classes of memoryless or Markovian sources, minimax redundancy and regret are both asymptotically equivalent to half the number of parameters times the logarithm base 2 of the encoded word length. In the non-parametric class of renewal sources, minimax redundancy and regret have the same asymptotic speed, up to multiplicative constants. All of this does not extend to infinite alphabets: there is no weakly universal code over the class of stationary ergodic sources, and we will see examples of classes for which the regret is infinite whereas the minimax redundancy is not. The chapter starts with an encoding of the integers, which will be useful in the design of other codes. Thanks to a theorem due to John Kieffer, we show that there is no weakly universal code over the class of stationary ergodic sources with values in a countable alphabet. We then focus on memoryless sources (sequences of i.i.d. random variables) and make use of the Minimax-Maximin Theorem 2.12 to obtain lower bounds on the minimax redundancy of classes characterized by the decay of the probability measure at infinity. Another approach is to code in two steps: first, encode the observed alphabet (letters occurring in the data), then, encode what is known as the “pattern”, containing information about the positions of letter repetitions, in their order of occurrence.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.