Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

S J Pennycook,S A Jarvis,S D Hammond,G R Mudalige

doi:10.1145/1964218.1964223

Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

S J Pennycook, S A Jarvis + Show 2 more

Open Access

https://doi.org/10.1145/1964218.1964223

Copy DOI

Journal: ACM SIGMETRICS Performance Evaluation Review	Publication Date: Mar 29, 2011
Citations: 76

Affiliation: University of Warwick, University of Oxford

#NAS Parallel Benchmark #Compute Unified Device Architecture + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We present the performance analysis of a port of the LU benchmark from the NAS Parallel Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and report on the optimisation efforts employed to take advantage of this platform. Execution times are reported for several different GPUs, ranging from low-end consumergrade products to high-end HPC-grade devices, including the Tesla C2050 built on NVIDIA's Fermi processor. We also utilise recently developed performance models of LU to facilitate a comparison between future large-scale distributed clusters of GPU devices and existing clusters built on traditional CPU architectures, including a quad-socket, quad-core AMD Opteron cluster and an IBM BlueGene/P.

Full Text