Abstract

The Burrows-Wheeler Transform (BWT) has played an important role in lossless data compression algorithms. To achieve a good compression ratio, the BWT block size needs to be several hundreds of kilobytes, which requires a large amount of on-chip memory resources and limits effective hardware implementations. In this paper, we analyze the bottleneck of the BWT acceleration and present a novel design to map the anti-sequential suffix sorting algorithm to FPGAs. Our design can perform BWT with a block size of up to 500KB (i.e., bzip2 level 5 compression) on the Xilinx Virtex UltraScale+ VCU1525 board, while the state-of-art FPGA implementation can only support 4KB block size. Experiments show our FPGA design can achieve ~2x speedup compared to the best CPU implementation using standard large Corpus benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call