Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

Ren Chen,Viktor K Prasanna

doi:10.1109/tpds.2017.2705128

Ren Chen, Viktor K Prasanna

Open Access

https://doi.org/10.1109/tpds.2017.2705128

Copy DOI

Abstract

Accelerating sorting using dedicated hardware to fully utilize the memory bandwidth for Big Data applications has gained much interest in the research community. Recently, parallel sorting networks have been widely employed in hardware implementations due to their high data parallelism and low control overhead. In this paper, we propose a systematic methodology for mapping large-scale bitonic sorting networks onto FPGA. To realize data permutations in the sorting network, we develop a novel RAM-based design by vertically “folding” the classic Clos network. By utilizing the proposed design for data permutation, we develop a hardware generator to automatically build bitonic sorting architectures on FPGAs. For given input size, data width and data parallelism, the hardware generator specializes both the datapath and the control unit for sorting and generates a design in high level hardware description language. We demonstrate trade-offs among throughput, latency and area using two illustrative sorting designs including a high throughput design and a resource efficient design. With a data parallelism of p (2 ≤ p ≤ N/2), the high throughput design sorts an N-key sequence with latency 6N=p + o(N), throughputp results per cycle and uses 6N + o(N) memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) and outperforms the state-of-the-art. Experimental results show that the designs obtained by our proposed hardware generator achieve 49 to 112 percent improvement in energy efficiency and 56 to 430 percent higher memory efficiency compared with the state-of-the-art.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Nov 1, 2017
Citations: 45	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Similar Papers

Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA
Ren Chen ... Sruja Siriyal
-
Ren Chen, et. al.Ren Chen ... Sruja Siriyal
22 Feb 2015
22 Feb 2015

Reliability analysis of a fault-tolerant sorting network
N.K Sharma ... P.U Tagle
-
N.K Sharma, et. al.N.K Sharma ... P.U Tagle
03 Nov 1997
03 Nov 1997

Performance of fault-tolerant sorting network for ATM switching
Neeraj K Sharma ... Pierre U Tagle
Performance Evaluation | VOL. 34
Neeraj K Sharma, et. al.Neeraj K Sharma ... Pierre U Tagle
01 Oct 1998
Performance Evaluation | VOL. 34

Bitonic sorting on FPGA for energy and memory efficient mapping
A Rasheedha ... As Poongkuzhali
-
A Rasheedha, et. al.A Rasheedha ... As Poongkuzhali
09 Oct 2021
09 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computer Generation of High Throughput and Memory Efficient Sorting Designs on FPGA

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems