Abstract
Text compression is one of the most significant research fields, and various algorithms for text compression have already been developed. This is a significant issue, as the use of internet bandwidth is considerably increasing. This article proposes a Burrows–Wheeler transform and pattern matching-based lossless text compression algorithm that uses Huffman coding in order to achieve an excellent compression ratio. In this article, we introduce an algorithm with two keys that are used in order to reduce more frequently repeated characters after the Burrows–Wheeler transform. We then find patterns of a certain length from the reduced text and apply Huffman encoding. We compare our proposed technique with state-of-the-art text compression algorithms. Finally, we conclude that the proposed technique demonstrates a gain in compression ratio when compared to other compression techniques. A small problem with our proposed method is that it does not work very well for symmetric communications like Brotli.
Highlights
Managing the increasing amount of data that are produced by modern daily life activities is not a simple task for symmetric communications
Researchers have developed many lossless text compression algorithms, they do not fulfill the current demand; researchers are still trying to develop a more efficient algorithm. From this point of view, we propose a lossless text compression procedure while using the Burrows–Wheeler transformation and Huffman coding in this paper
The state-of-the-art techniques and the proposed method are compared based on the compression ratio (CR) that is calculated using Equation (1)
Summary
Managing the increasing amount of data that are produced by modern daily life activities is not a simple task for symmetric communications. In articles [1,2], it is reported that, on average, 4.4 zettabytes and 2.5 exabytes of data were produced per day in 2013 and 2015, respectively. Companies are producing plenty of hardware in an attempt to provide a better solution for working with huge amounts of data, it’s almost impossible to maintain this data without compression. There are two types of compression techniques: lossless and lossy [8,9]. Lossless compression reproduces data perfectly from its encoded bit stream, and, in lossy compression, less significant information is removed [10,11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.