Abstract
Fast sorting of large-scale data is an essential task for data centers. In previous works, the existing computational model of sorting kernel still results in lower bandwidth utilization on the external memory bus. And the execution of merge operations in merge sort circuit on FPGAs depends on control commands from the host CPU. In this case, the merge sort circuit is not fully offloaded to hardware layer for acceleration, resulting in a performance loss. We design an on-chip merge sort controller to efficiently command the merge sort process. The proposed controller has the ability to schedule multiple on-chip computing kernels simultaneously in a more efficient mode, thus ensuring that the circuit has a better bandwidth utilization. Meanwhile, fundamental factors affecting the performance of merge sort are studied and analyzed, and we propose a high-performance merge sort architecture. Results show that using the proposed controller-centered architecture, an overall improvement of 20-30% in sorting throughput can be achieved. Compared with the state-of-the-art previous merge sorting implementation on FPGA, our circuit can achieve 1.22/1.46 \(\times\) speedup.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have