Sorting, which is widely used in different areas such as database systems, IP routing, bio informatics, and cognitive-processing-based applications, imposes considerable overhead on computing resources. Therefore, an efficient on-chip sorting accelerator may significantly enhance real-time decision-making in such applications. In this paper we introduce a novel pipelined and parallel sorting algorithm with streaming I/O, with the time, logic, and memory complexity of O ( n ) , O ( n ) , and O ( n ) , respectively. We present a formal analysis to prove the correctness of this algorithm. We then model, verify, and synthesize this unconditional algorithm (in the TSMC 0.13 micron technology) for 4k-word clusters as an ASIC accelerating engine. More specifically, our implementation with 3969-word multiple-bank memory, 63 word-size comparators, 64 word-size multiplexers, and 63 word-size registers only requires some 8k clock cycles to sort an arbitrary 3969-word long array of random data, which arrive at the sorter and also depart it one item at a time.
Read full abstract