Text compression using recency rank with context and relation to context sorting, block sorting and PPM/sup */

K Sadakane

doi:10.1109/sequen.1997.666925

Abstract

A block sorting compression scheme was developed and its relation to a statistical scheme was studied, but a theoretical analysis of its performance has not been studied fully. Context sorting is a compression scheme based on context similarity and it is regarded as an on-line version of block sorting and it is asymptotically optimal. However, the compression speed is slower and the real performance is not better. We propose a compression scheme using recency rank code with context (RRC), which is based on context similarity. The proposed method encodes characters to recency ranks according to their contexts. It can be implemented using suffix tree and the recency rank code is realized by move-to-front transformation of edges in the suffix tree. It is faster than context sorting and is also asymptotically optimal. The performance is improved by changing models according to the length of the context and by combining some characters into a code. However, it is still inferior to block sorting in both performance and speed. We investigate the reason for the bad performance and we also prove the asymptotical optimality of a variation of block sorting and derive the relation among the RRC, context sorting, block sorting and PPM/sup */ clear.

Full Text