Abstract

Encoding a sequence of symbols over a large alphabet size is a challenging problem with applications in many fields. The most widely used adaptive entropy coding techniques (namely, arithmetic and Huffman coding) are known to achieve an average codeword length which may be significantly greater than the empirical entropy of the sequence, as the alphabet size increases. In this work we introduce an efficient and easy-to-implement method for large alphabet adaptive encoding. We propose a conceptual framework in which a sequence of symbols, over a large alphabet size, is decomposed into multiple almost independent sequences over a smaller alphabet. Then each of these sequences is encoded separately. This way, we allow encoding of small alphabet sequences, at the cost of the remaining dependence among the sequences. We demonstrate the advantages of our suggested scheme through a series of theorems and experiments, showing it reduces both the average codeword length and the compression runtime in many large alphabet setups.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call