FPGA-Based Large-Scale Sorting with Optimized Bandwidth Utilization

Mingqian Sun,Guangwei Xie,Fan Zhang,Wei Guo,Xitian Fan,Li Chen,Jiayu Du

doi:10.1145/3716392

Mingqian Sun, Guangwei Xie + Show 5 more

https://doi.org/10.1145/3716392

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Fast sorting of large-scale data is an essential task for data centers. In previous works, the existing computational model of sorting kernel still results in lower bandwidth utilization on the external memory bus. And the execution of merge operations in merge sort circuit on FPGAs depends on control commands from the host CPU. In this case, the merge sort circuit is not fully offloaded to hardware layer for acceleration, resulting in a performance loss. We design an on-chip merge sort controller to efficiently command the merge sort process. The proposed controller has the ability to schedule multiple on-chip computing kernels simultaneously in a more efficient mode, thus ensuring that the circuit has a better bandwidth utilization. Meanwhile, fundamental factors affecting the performance of merge sort are studied and analyzed, and we propose a high-performance merge sort architecture. Results show that using the proposed controller-centered architecture, an overall improvement of 20-30% in sorting throughput can be achieved. Compared with the state-of-the-art previous merge sorting implementation on FPGA, our circuit can achieve 1.22/1.46 \(\times\) speedup.

Full Text