Abstract

The aim of this paper is to present new high-performance implementations of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose an algorithmic approach that combines language-based vectorization techniques together with a new divide-and-conquer parallel method that exploits a special sparse structure of the matrix obtained from the recursive formula that defines the generator. Our portable OpenACC implementation achieves the performance comparable to those achieved by our CUDA-based and OpenMP-based implementations on GPUs and multicore CPUs, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call