Abstract
Some of the best compression ratios for text compression are provided by the PPM (prediction by partial matching) class of algorithms. These algorithms are based on arithmetic coding using a fixed-depth Markov chain model of the source, i.e., the subsequence of symbols generated in any state s of the source is assumed to be the output of a memoryless subsource w=w(s). One of the most crucial steps in any PPM algorithm is the choice of the probability, i.e., the probability of a previously unseen appearing in a given state. Let A={1,...,M} be the source alphabet, x/sup k/=x/sub 1/x/sub 1/...x/sub k//spl isin/A/sup k/ be a sequence of symbols generated by a subsource w with unknown parameters, and m=m(x/sup k/) be the number of different symbols in x/sup k/. In most incarnations of PPM, the expression for the probability used for coding is of the form /spl thetav/(E|x/sup k/)=k(k,m)/k+m/spl alpha/+k(k,m), where /spl alpha/ is a parameter. The encoding of the escape symbol E is followed by a description of the new symbol, which is independent of the choice of probability and is not discussed. In almost all PPM schemes, the expression for the probability has been chosen heuristically. Following the method of Shtarkov et al. (see Probl. of Information Transmission, vol.31, no.2, p.20-35, 1995) we use a universal coding approach.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have