Abstract

AbstractTop-K and selection operations are critical in data processing and analysis, and their efficient implementation on GPUs is increasingly important due to the growing demands of data analysis. Existing methods, primarily relying on the bucket partition execution model, encounter challenges such as uneven bucket distribution and latency in merging processes. To address these issues, we introduce a novel Split-Bucket Partition (SBP) execution model that specifically addresses these challenges. Additionally, we propose task and control flow optimizations targeted at top-K and selection algorithms, which further contribute to performance improvements. Our optimized algorithms significantly outperform existing approaches, delivering performance gains of up to $$2.3$$ 2.3 times and $$1.6$$ 1.6 times for different bucket partitioning rules. Our algorithms show robust performance improvements in non-uniform data scenarios, with gains ranging from $$1.9$$ 1.9 times to $$15.5$$ 15.5 times. However, it should be noted that the SBP model has limitations related to shared memory and register utilization, potentially impacting performance. Tests on TU102 and A100 GPU architectures validate the effectiveness of our approach, achieving a maximum speedup of $$2.9$$ 2.9 times. The study suggests that while the SBP model is effective for top-K and selection algorithms, it also holds promise for other computational tasks, setting the stage for future research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call