Abstract
We compare different methods for sampling from discrete or piecewise constant probability distributions and introduce a new algorithm which is especially efficient on massively parallel processors, such as GPUs. The scheme preserves the distribution properties of the input sequence, exposes constant time complexity on the average, and significantly lowers the average number of operations for certain distributions when sampling is performed in a parallel algorithm that requires synchronization. Avoiding load balancing issues of naive approaches, a very efficient massively parallel construction algorithm for the required auxiliary data structure is proposed.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have