Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding

Md Atiqur Rahman,Mohamed Hamada

doi:10.3390/sym12101654

Abstract

Text compression is one of the most significant research fields, and various algorithms for text compression have already been developed. This is a significant issue, as the use of internet bandwidth is considerably increasing. This article proposes a Burrows–Wheeler transform and pattern matching-based lossless text compression algorithm that uses Huffman coding in order to achieve an excellent compression ratio. In this article, we introduce an algorithm with two keys that are used in order to reduce more frequently repeated characters after the Burrows–Wheeler transform. We then find patterns of a certain length from the reduced text and apply Huffman encoding. We compare our proposed technique with state-of-the-art text compression algorithms. Finally, we conclude that the proposed technique demonstrates a gain in compression ratio when compared to other compression techniques. A small problem with our proposed method is that it does not work very well for symmetric communications like Brotli.

Highlights

Managing the increasing amount of data that are produced by modern daily life activities is not a simple task for symmetric communications
Researchers have developed many lossless text compression algorithms, they do not fulfill the current demand; researchers are still trying to develop a more efficient algorithm. From this point of view, we propose a lossless text compression procedure while using the Burrows–Wheeler transformation and Huffman coding in this paper
The state-of-the-art techniques and the proposed method are compared based on the compression ratio (CR) that is calculated using Equation (1)

Summary

Introduction

Managing the increasing amount of data that are produced by modern daily life activities is not a simple task for symmetric communications. In articles [1,2], it is reported that, on average, 4.4 zettabytes and 2.5 exabytes of data were produced per day in 2013 and 2015, respectively. Companies are producing plenty of hardware in an attempt to provide a better solution for working with huge amounts of data, it’s almost impossible to maintain this data without compression. There are two types of compression techniques: lossless and lossy [8,9]. Lossless compression reproduces data perfectly from its encoded bit stream, and, in lossy compression, less significant information is removed [10,11]

Methods

Results

Conclusion