Bandwidth-Optimal Random Shuffling for GPUs

Rory Mitchell,Geoffrey Holmes,Daniel Stokes,Eibe Frank

doi:10.1145/3505287

Abstract

Linear-time algorithms that are traditionally used to shuffle data on CPUs, such as the method of Fisher-Yates, are not well suited to implementation on GPUs due to inherent sequential dependencies, and existing parallel shuffling algorithms are unsuitable for GPU architectures because they incur a large number of read/write operations to high latency global memory. To address this, we provide a method of generating pseudo-random permutations in parallel by fusing suitable pseudo-random bijective functions with stream compaction operations. Our algorithm, termed “bijective shuffle” trades increased per-thread arithmetic operations for reduced global memory transactions. It is work-efficient, deterministic, and only requires a single global memory read and write per shuffle input, thus maximising use of global memory bandwidth. To empirically demonstrate the correctness of the algorithm, we develop a statistical test for the quality of pseudo-random permutations based on kernel space embeddings. Experimental results show that the bijective shuffle algorithm outperforms competing algorithms on GPUs, showing improvements of between one and two orders of magnitude and approaching peak device bandwidth.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bandwidth-Optimal Random Shuffling for GPUs

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing

Lead the way for us

Journal: ACM Transactions on Parallel Computing	Publication Date: Jan 31, 2022
Citations: 2

Similar Papers

On The Exact Security of Message Authentication Using Pseudorandom Functions
Ashwin Jha ... Avradip Mandal
IACR Transactions on Symmetric Cryptology | VOL. -
Ashwin Jha, et. al.Ashwin Jha ... Avradip Mandal
08 Mar 2017
IACR Transactions on Symmetric Cryptology | VOL. -

On The Exact Security of Message Authentication Using Pseudorandom Functions
...
IACR Cryptology ePrint Archive | VOL. 2017
, et. al. ...
08 Mar 2017
IACR Cryptology ePrint Archive | VOL. 2017

A Critique of the PRAM Model of Computation
Alok Aggarwal
-
Alok AggarwalAlok Aggarwal
01 Jan 1989
01 Jan 1989

An Optimal Parallel Algorithm for Computing the Summed Area Table on the GPU
Yutaro Emoto ... Koji Nakano
-
Yutaro Emoto, et. al.Yutaro Emoto ... Koji Nakano
01 May 2018
01 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bandwidth-Optimal Random Shuffling for GPUs

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing