Adaptive Huffman Algorithm for Data Compression Using Text Clustering and Multiple Character Modification

Babita Kumari,Neeraj Kumar Kamal,Arif Mohammad Sattar,Mritunjay Kr Ranjan

doi:10.37591/rtpl.v10i1.509

Babita Kumari, Neeraj Kumar Kamal + Show 2 more

https://doi.org/10.37591/rtpl.v10i1.509

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Adaptive Huffman algorithm is a popular data compression technique that creates a variable-length binary code for each symbol in a message. However, the original algorithm may not be efficient in compressing text data, particularly when dealing with long sequences of repeated characters. In this study, we propose a novel approach to enhance the compression ratio of the Adaptive Huffman algorithm by utilizing text clustering and multiple character modification. The proposed method first clusters the text data into groups of similar words or phrases. Then, it modifies multiple characters in each group to reduce redundancy and increase the frequency of the most common characters. This modification enables the Adaptive Huffman algorithm to produce shorter codes for the modified characters and effectively compress the clustered text data. Experimental results on a benchmark dataset show that the proposed method achieves better compression ratios than the traditional Adaptive Huffman algorithm and other state-of-the-art compression methods. The proposed method can be applied to various text data, such as documents, emails, and chat messages, and can significantly reduce storage and transmission costs.

Full Text