Abstract

SummaryRadix sorting is an essential basic data processing operation in many computer fields. It has important practical significance to accelerate its performance through Graphic Processing Unit (GPU). The heterogeneous parallel computing technology attracts much attention and is widely applied for its effective computation efficiency and parallel real‐time data processing capability. Taking advantage of the parallelism of GPU in numerical computation processing, a parallelization design method of the Binary_Least Significant Digit (LSD) first Radix Sorting (B_LSD_RS) algorithm based on Open Computing Language (OpenCL) is proposed. The radix sorting algorithm is divided into multiple kernel tasks, and the kernels are sequentially controlled by the event information transfer. The parallel algorithm is implemented and verified on the GPU + CPU heterogeneous platform. The experimental results show that compared with the performance of the B_LSD_RS sequential algorithm based on AMD Ryzen5 1600X CPU, B_LSD_RS parallel algorithm based on Open Multi‐Processing (OpenMP) and B_LSD_RS parallel algorithm based on Compute Unified Device Architecture (CUDA), the B_LSD_RS parallel algorithm based on OpenCL obtained 28.86 times, 11.01 times and 2.14 times speedup in the NVIDIA GTX 1070 computing platform respectively, not only achieves high performance but also achieves performance portability among different GPU computing platforms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call