Using distributed memory parallel computers and GPU clusters for multidimensional Monte Carlo integration

Dominik Szałkowski,Przemysław Stpiczyński

doi:10.1002/cpe.3365

Abstract

SummaryThe aim of this paper is to show that the multidimensional Monte Carlo integration can be efficiently implemented on various distributed memory parallel computers and clusters of multicore nodes using recently developed parallel versions of linear congruential generator and lagged Fibonacci generator pseudorandom number generators. We show how to accelerate the overall performance by offloading some computations to Graphics Processing Units (GPUs), and we discuss how to transform Message Passing Interface (MPI) + OpenMP programs to MPI + OpenMP + CUDA model. We explain how to utilize multiple cores of CPUs together with multiple GPU accelerators within a single node and how to achieve reasonable load balancing of all computational resources of GPU‐accelerated multicore nodes. We present and discuss the results of experiments performed on the following target architectures: IBM Blue Gene/Q parallel computer, a cluster of Intel Xeon E5‐2660 servers, and a Tesla‐based GPU cluster with Intel Xeon X5650 multicore processors. The results are presented from two points of view: strong scaling and weak scaling. We also compare the performance of all considered architectures. Copyright © 2014 John Wiley & Sons, Ltd.

Full Text