Selection of prefix and postfix word fragments for data compression

T Radhakrishnan

doi:10.1016/0306-4573(78)90067-5

Abstract

In this paper a simple algorithm is used for selection of a set of codeable substrings that occur at the front or rear of the words in a textual data base. Since the words are assumed to be non-repeating, the technique is useful for data compression of dictionaries. The time complexity of the algorithm is governed by the associated sorting algorithm and hence is 0 ( n log n). It has been applied to three sample data bases, consisting of words selected from street names, authors names, or general written English text. The results show that the substrings at the rear of the words, yield better compression than those at the front. By application of results of an earlier study in compression coding, efficient encoding and decoding procedures are presented for use in on-line transmission of data.

Full Text