Abstract

Although traditional universal compression algorithms can effectively utilise repetition located in a slide window, they cannot take their own advantages for some message source in which similar messages are distributed uniformly. In this paper, we come up with a universal segmenting-sorting compression algorithm to solve this problem. The key idea is to reorder the message source before compressing it with Lz77 algorithm. We design transformation methods for two common data types, corpus of webpages and access log. The experimental results show that segmenting-sorting transformation is truly beneficial to compression ratio. Our new algorithm is able to make compression ratio 20% to 50% lower than naive Lz77 algorithm does and takes almost the same decompression time. For some read-heavy source segmenting-sorting compression can reduce space cost while guaranteeing throughput.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.