Chapter 8 - Comparison-Based In-Place Sorting with CUDA

Hagen Peters,Ole Schulz-Hildebrandt

doi:10.1016/b978-0-12-385963-1.00008-3

Abstract

Although there are many efficient sorting algorithms and implementations for graphics processing units (GPUs), none of them are both comparison-based and work in-place. The sorting algorithm presented in this chapter is a sorting algorithm for NVIDIA's GPUs that is both comparison-based and works in-place. The algorithm used in the implementation presented is bitonic sort. Although the time complexity of this algorithm is O(nlog2n), it is a widely used parallel sorting algorithm. Bitonic sort can be efficiently parallelized since it is based on a sorting network. Processing a sorting network in parallel requires a mechanism for synchronization and communication between parallel processing units. Using CUDA, those units are typically implemented by CUDA-threads.In general, synchronization between arbitrary threads is not possible in CUDA; thus, to ensure a specific order of tasks, these tasks have to be executed in consecutive kernel launches. “Communication” among consecutive kernel launches is obtained by writing to (persistent) global GPU memory. It specifically focuses on two main aspects when implementing bitonic sort for NVIDIAs GPUs—reducing communication and synchronization induced by bitonic sort in order to reduce the number of kernel launches and accesses to global memory; extensively using the shared memory and efficient inner-block synchronization. This results in a decreased number of kernel launches and global memory accesses.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 8 - Comparison-Based In-Place Sorting with CUDA

Abstract

Talk to us

Similar Papers

More From: GPU Computing Gems Jade Edition

Lead the way for us

Journal: GPU Computing Gems Jade Edition	Publication Date: Nov 30, 2011
Citations: 2

Similar Papers

Parallelization of bitonic sort and radix sort algorithms on many core GPUs
Zehra Yildiz ... Musa Aydin
-
Zehra Yildiz, et. al.Zehra Yildiz ... Musa Aydin
01 Nov 2013
01 Nov 2013

A comparison of sorting algorithms for the connection machine CM-2
Guy E Blelloch ... Charles E Leiserson
-
Guy E Blelloch, et. al.Guy E Blelloch ... Charles E Leiserson
01 Jan 1991
01 Jan 1991

Optimized OpenCL implementation of the Elastodynamic Finite Integration Technique for viscoelastic media
M Molero-Armenta ... M.G Hernández
Computer Physics Communications | VOL. 185
M Molero-Armenta, et. al.M Molero-Armenta ... M.G Hernández
28 May 2014
Computer Physics Communications | VOL. 185

Bitonic Sorting Algorithm: A Review
Megha Jain ... V.K Patle
International Journal of Computer Applications | VOL. 113
Megha Jain, et. al.Megha Jain ... V.K Patle
18 Mar 2015
International Journal of Computer Applications | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 8 - Comparison-Based In-Place Sorting with CUDA

Abstract

Talk to us

Similar Papers

More From: GPU Computing Gems Jade Edition