Abstract

We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in time within O ( n ( 1 + lg α ) ) ⊆ O ( n lg n ) , where the alternation α ∈ [ 1 . . n − 1 ] approximates the minimal amount of sorting required by the computation. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of size n and alternation α . Such results refine the state of the art complexity of Θ ( n lg n ) in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952. Beside the new analysis technique, such improvement is obtained by combining a new algorithm, inspired by van Leeuwen’s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with a relatively minor extension of Karp et al.’s deferred data structure to partially sort a multiset accordingly to the queries performed on it (known since 1988). Preliminary experimental results on text compression by words show α to be polynomially smaller than n, which suggests improvements by at most a constant multiplicative factor in the running time for such applications.

Highlights

  • In order to consider the optimality of this running time, one must notice that this algorithm is not in the comparison model, but still in a quite restricted computational model, dubbed the algebraic decision tree computational model [13], composed of algorithms which can be modeled as a decision tree where decision nodes are based only on algebraic operations with a finite number of operators

  • In the algebraic decision tree computational model, the complexity of the algorithm suggested by Huffman [1] is asymptotically optimal for any constant value of D, in the worst case over instances composed of n positive weights, as computing the optimal prefix free code for the ( D × n + 1) weights

  • Note original sorted input sorted input compressed input and Output sorted input k distinct code lengths and sorted input k distinct code lengths k distinct code lengths α = |S| EI ∈ [1..n − 1]. Such example of “easy instances” suggest that it could be possible to compute optimal prefix free codes in much less time by taking advantage of some measure of “easiness”, and Belal and Elmasry [16,17] proposed an algorithm claimed to perform within O(kn) algebraic operations, in the worst case over instances formed by n weights such that the binary prefix free code obtained by Huffman’s method [1] has exactly k distinct code lengths

Read more

Summary

Introduction

Of n messages (We use the conveniently concise and general terminology of messages for the input and symbols for the output, as introduced by Huffman [1] himself, which should not be confused with other terminologies found in the literature, of input symbols, letters, or words for the input and output symbols or bits in the binary case for the output), and a constant number D of (output) symbols , an optimal prefix free code [1] is a set of n code strings on the alphabet [1..D ], of variable lengths L[1..n] such that no string is prefix of another, and the average length of a code is minimized (i.e., ∑in=1 L[i ]W [i ] is minimal) The particularity of such codes is that even though the code strings assigned to the messages can differ in lengths (assigning shorter ones to more frequent messages yields compression to ∑in=1 L[i ]W [i ] symbols), the prefix free property insures a non-ambiguous decoding. This, in turns, has applications to the compression of general texts, via the compression of the permutations appearing in a Burrows Wheeler’s transform of the text

Background
Question
Contributions
Solution
General Intuition
Partial Sum Deferred Data Structure
Conclusion
Analysis
Running Time Upper Bound
Lower Bound
Preliminary Experimentations
Discussion
Relation to Previous Work
Previous Work on Optimal Prefix Free Codes
Applicability of Dynamic Results on Deferred Data Structures
Applicability of Refined Results on Deferred Data Structures
Instance Optimality
In Classical Computational Models and Applications
Generalisation to Non Binary Output Alphabets
External Memory
Variants of the Optimal Prefix Free Code Problem
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call