Optimal Prefix Free Codes with Partial Sorting

Jérémy Barbay

doi:10.3390/a13010012

Abstract

We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in time within O ( n ( 1 + lg α ) ) ⊆ O ( n lg n ) , where the alternation α ∈ [ 1 . . n − 1 ] approximates the minimal amount of sorting required by the computation. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of size n and alternation α . Such results refine the state of the art complexity of Θ ( n lg n ) in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952. Beside the new analysis technique, such improvement is obtained by combining a new algorithm, inspired by van Leeuwen’s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with a relatively minor extension of Karp et al.’s deferred data structure to partially sort a multiset accordingly to the queries performed on it (known since 1988). Preliminary experimental results on text compression by words show α to be polynomially smaller than n, which suggests improvements by at most a constant multiplicative factor in the running time for such applications.

Highlights

In order to consider the optimality of this running time, one must notice that this algorithm is not in the comparison model, but still in a quite restricted computational model, dubbed the algebraic decision tree computational model [13], composed of algorithms which can be modeled as a decision tree where decision nodes are based only on algebraic operations with a finite number of operators
In the algebraic decision tree computational model, the complexity of the algorithm suggested by Huffman [1] is asymptotically optimal for any constant value of D, in the worst case over instances composed of n positive weights, as computing the optimal prefix free code for the ( D × n + 1) weights
Note original sorted input sorted input compressed input and Output sorted input k distinct code lengths and sorted input k distinct code lengths k distinct code lengths α = |S| EI ∈ [1..n − 1]. Such example of “easy instances” suggest that it could be possible to compute optimal prefix free codes in much less time by taking advantage of some measure of “easiness”, and Belal and Elmasry [16,17] proposed an algorithm claimed to perform within O(kn) algebraic operations, in the worst case over instances formed by n weights such that the binary prefix free code obtained by Huffman’s method [1] has exactly k distinct code lengths

Summary

Introduction

Of n messages (We use the conveniently concise and general terminology of messages for the input and symbols for the output, as introduced by Huffman [1] himself, which should not be confused with other terminologies found in the literature, of input symbols, letters, or words for the input and output symbols or bits in the binary case for the output), and a constant number D of (output) symbols , an optimal prefix free code [1] is a set of n code strings on the alphabet [1..D ], of variable lengths L[1..n] such that no string is prefix of another, and the average length of a code is minimized (i.e., ∑in=1 L[i ]W [i ] is minimal) The particularity of such codes is that even though the code strings assigned to the messages can differ in lengths (assigning shorter ones to more frequent messages yields compression to ∑in=1 L[i ]W [i ] symbols), the prefix free property insures a non-ambiguous decoding. This, in turns, has applications to the compression of general texts, via the compression of the permutations appearing in a Burrows Wheeler’s transform of the text

Background

Question

Contributions

Solution

General Intuition

Partial Sum Deferred Data Structure

Conclusion

Analysis

Running Time Upper Bound

Lower Bound

Preliminary Experimentations

Discussion

Relation to Previous Work

Previous Work on Optimal Prefix Free Codes

Applicability of Dynamic Results on Deferred Data Structures

Applicability of Refined Results on Deferred Data Structures

Instance Optimality

In Classical Computational Models and Applications

Generalisation to Non Binary Output Alphabets

External Memory

Variants of the Optimal Prefix Free Code Problem

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Dec 31, 2019
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimal Prefix Free Codes with Partial Sorting

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

Some Tight Lower Bounds on the Redundancy of Optimal Binary Prefix-Free and Fix-Free Codes
Mohammad Javad-Kalbasi ... Mohammadali Khosravifard
IEEE Transactions on Information Theory | VOL. 66
Mohammad Javad-Kalbasi, et. al.Mohammad Javad-Kalbasi ... Mohammadali Khosravifard
14 Apr 2020
IEEE Transactions on Information Theory | VOL. 66

11 - Lower Bounds
Selim G Akl
Parallel Sorting Algorithms | VOL. -
Selim G AklSelim G Akl
01 Jan 1985
Parallel Sorting Algorithms | VOL. -

Multichannel Optimal Tree-Decodable Codes are Not Always Optimal Prefix Codes
Hoover H F Yin ... Mehrdad Tahernia
-
Hoover H F Yin, et. al.Hoover H F Yin ... Mehrdad Tahernia
26 Jun 2022
26 Jun 2022

Efficient and Compact Representations of Prefix Codes
Travis Gagie ... Yakov Nekrich
IEEE Transactions on Information Theory | VOL. 61
Travis Gagie, et. al.Travis Gagie ... Yakov Nekrich
01 Sep 2015
IEEE Transactions on Information Theory | VOL. 61

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal Prefix Free Codes with Partial Sorting

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms