Abstract

We propose an efficient, sub-optimal prefix code construction method for discrete sources with a finite alphabet and known probability mass function (pmf). It is well known that for a source that puts out symbols x i with probability p i , the optimal codeword lengths are l i = log ( 1 / p i ) . However, codeword lengths are integers and log ( 1 / p i ) is, in general, not an integer. We propose a method to find binary codewords for x i whose lengths are initially assumed to be ⌈ log ( 1 / p i ) ⌉ − 1 . Every prefix code must satisfy the Kraft's inequality but our initial codeword lengths may not satisfy the Kraft's inequality. Using a simplified version of the subset sum problem we find a minimal set of codeword lengths that must be increased from ⌈ log ( 1 / p i ) ⌉ − 1 to ⌈ log ( 1 / p i ) ⌉ , so that Kraft's inequality is satisfied. Even though this solution is not optimal it leads to average codeword lengths that are close to optimal and in some cases codeword lengths that are optimal. Unlike the Huffman code, our solution does not require the ordering of probabilities in the pmf. The efficiency of our method can be further improved by reducing the size of the subset sum problem. The example of English text shows that our method leads to a solution that is very close to the optimal solution. The proposed method can also be used for encryption, thereby accomplishing both compression and encryption simultaneously.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call