Abstract

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.