Abstract

The Burrows-Wheeler Transform (BWT) has received special attention due to its effectiveness in lossless data compression algorithms. Because BWT is a time-consuming task, the efficient hardware accelerator that can yield high throughputs is required in real-time applications. This paper presents a novel BWT accelerator based on the streaming sorting network. The streaming sorting network performs the suffix sorting of large amount of data which is the most difficult task in BWT. Our BWT accelerator is implemented on a NetFPGA board. Experimental results show that it achieves 14.3X speedup compared with the state-of-art work when the data block size is 4KB. Furthermore, we design and implement a lossless data compression system based on the proposed BWT accelerator. The hardware system is composed of Burrows-Wheeler Transform module, the move-to-front encoding module, the run length encoding module, and the canonical Huffman encoding module. We evaluate the system performance on a NetFPGA board at the frequency of 155MHz. The throughput of the system could reach 179 MB/s on board when we use only one streaming sorting network for a 4KB block. The system throughput can be linearly improved up to 537 MB/s in simulation on a Virtex UltraScale xcvu440 chip if we use three streaming sorting networks to compute BWT.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call