SummaryDeflate coding is a very popular lossless data compression method used in zlib, gzip (GNU zip), and zip, which performs the LZSS compression algorithm with Huffman coding. Deflate encoding and decoding involve sequential operations and their parallel acceleration using a GPU is quite hard. The main purpose of this paper is to present GPU implementations for encoding and decoding of Deflate coding. For efficient GPU implementations of Deflate coding, we have used multiple small hash tables for finding matching subsequences in the dictionary by multiple threads in parallel and applied the Single Kernel Soft Synchronization (SKSS) technique to fully utilize GPU computing resources. We have also adopted Huffman coding with gap arrays to accelerate parallel Huffman decoding. We have evaluated the performance of our GPU implementations using an NVIDIA A100 GPU and compared them with parallel/sequential Deflate encoding and decoding on the Intel X86 multicore CPUs using multiple threads/a single thread. Our GPU implementation of Deflate decoding is 1.66x–8.33x faster than the multiple thread implementation and 4.13x–36.56x faster than the single thread implementation.
Read full abstract