Abstract

The main contribution of this paper is to show efficient FIFO-based hardware sorters that sort n elements with w bits each stored in a high bandwidth memory with modest access latency. We assume that each address of the high bandwidth memory can store p elements of w bits each, which can be read or written at the same time. The access latency l of the high bandwidth memory is assumed to take l clock cycles to access p elements in a specified address. Furthermore, burst mode is supported and k (≥ 1) consecutive addresses can be accessed in k+l-1 clock cycles in a pipeline fashion. However, if k addresses are not consecutive, kl clock cycles are necessary to access all of them. Clearly, all n elements arranged n/p addresses can be duplicated in 2(n/p+l-1) clock cycles. We present two types of hardware sorters that sort n=rc elements stored in an r×c matrix of the high bandwidth memory. We first develop Three-Pass-Sort and Four-Pass-Sort that sort an r×c matrix by reading from and witting in it three times and four times, respectively. We implement these two algorithms using FIFO-based mergers that can be configured as pairwise mode and sliding mode. Our hardware sorter based on Three-Pass-Sort runs in 6n/p+3c^2/p^2l+O(c/p(l+log r)+r) clock cycles using a circuit of size O(rwp) provided that r≥c^2. Also, our hardware sorter based on Four-Pass-Sort runs in 8n/p+2c^2l+O(cl+log r+p) clock cycles using a circuit of size O(rw).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call