Abstract

The on-line sequence modelling algorithm ‘Prediction by Partial Matching’ (PPM) has set the performance standard in lossless data compression research since Moffat's 1990 implementation, PPMC. Despite intense research activity, only Howard's 1993 escape-count update mechanism ‘D’ has provided any consistent, order-independent performance improvement to PPMC (about 1%). Most notably, the recently introduced PPM variant, PPM*, which eliminates PPM's order bound, fails to offer compression results superior to those of PPMC with Markov order greater than four. This paper explains how to significantly improve the compression performance of any PPM variant (by 5–12%) by combining PPM's probability estimator, ‘blending’, with information-theoretic state selection. Hazards inherent to this combination are overcome by identifying the distinct semantics of the two approaches and resolving the differences using a dual-frequency update mechanism. We present and apply our percolating state selector, plus an enhancement to blending, both of which we have recently shown to independently outperform all competing techniques from the literature. We also give a minimal linear-space suffix-tree implementation of PPM and PPM*. Performance is measured in experiments run on the Calgary Corpus using our reimplementation of the original algorithms in an executable cross-product of independent model components, which permits precise control of all modelling algorithm features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.