Dynamic Data Compression Algorithm Selection for Big Data Processing on Local File System

Guo Helin,Wang Fuzong,Zhao Jian

doi:10.1145/3297156.3297233

Abstract

Data Compression has become a commodity feature for space efficiency and performance by reducing reading and writing traffic and space capacity demand. This technology is particularly valuable for a file system to manage and server the big data processing tasks. However, the fixed data compression scheme cannot fit all the big data workloads and data-set which have complex internal data structure and compressibility. This paper investigates a dynamic and smart data compression algorithm selection scheme for different big data processing cases in the local file system. To this end, we propose a dynamic algorithm selection module in the Linux ZFS which is an open source file system. This module will select a high compression ratio algorithm for high compressibility data, and select a fast compression algorithm for low compressibility data, and skip all data compression process for incompressibility data. The comprehensive evaluations validate that dynamic algorithm selection module can achieve up to 2.69x response time improvement for reading and writing operation in file system and reduce about 32.12% storage space for a large amount data-set.

Full Text