Abstract

Natural language compression has made great progress in the last two decades. The main step in this evolution was the introduction of word-based compression by Moffat. Another improvement came with so-called Dense codes, which proved to be very fast in compression and decompression while keeping a good compression ratio and direct search capability. Many variants of the Dense codes have been described, each of them using its own definition. In this paper, we present a generalized concept of dense coding called Open Dense Code (ODC), which aims to be a frame for the definition of many other dense code schemas. ODC underlines common features of the dense code schemas but at the same time allows one to express the divergences of each of them. Using the frame of ODC, we present two new word-based statistical compression algorithms based on the dense coding idea: Two Byte Dense Code (TBDC) and Self-Tuning Dense Code (STDC). Our algorithms improve the compression ratio and are considerate to smaller files, which are very often omitted by other compressors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.