Abstract
Adaptive Huffman algorithm is a popular data compression technique that creates a variable-length binary code for each symbol in a message. However, the original algorithm may not be efficient in compressing text data, particularly when dealing with long sequences of repeated characters. In this study, we propose a novel approach to enhance the compression ratio of the Adaptive Huffman algorithm by utilizing text clustering and multiple character modification. The proposed method first clusters the text data into groups of similar words or phrases. Then, it modifies multiple characters in each group to reduce redundancy and increase the frequency of the most common characters. This modification enables the Adaptive Huffman algorithm to produce shorter codes for the modified characters and effectively compress the clustered text data. Experimental results on a benchmark dataset show that the proposed method achieves better compression ratios than the traditional Adaptive Huffman algorithm and other state-of-the-art compression methods. The proposed method can be applied to various text data, such as documents, emails, and chat messages, and can significantly reduce storage and transmission costs.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have