Optimizing the HPCC randomaccess benchmark on blue Gene/L Supercomputer

Rahul Garg,Yogish Sabharwal

doi:10.1145/1140103.1140324

Abstract

The performance of supercomputers has traditionally been evaluated using the LINPACK benchmark [3], which stresses only the floating point units without significantly loading the memory or the network subsystems. The HPC Challenge (HPCC) benchmark suite is being proposed as an alternative to evaluate the performance of supercomputers. It consists of seven benchmarks, each designed to measure a specific aspect of the system performance. These benchmarks include (i) the high performance linpack (HPL) (ii) DGEMM, which measures the floating point rate of execution of double precision real matrix-matrix multiplication, (iii) STREAM that measures sustainable memory bandwidth and the corresponding computation rate for four simple vector kernels, namely, copy, scale, add and triad (iv) PTRANS that exercises the network by taking parallel transpose of a large distributed matrix (v) Randomaccess that measures the rate of integer updates to random memory locations (vi) FFT which measures the floating point rate of execution of a double precision complex one-dimensional Discrete Fourier Transform (DFT) and (vii) communication bandwidth and latency which measures latency and bandwidth of a number of simultaneous communication patterns. In this paper we outline the optimization techniques used to obtain the presently best reported performance of the HPCC Randomaccess benchmark on the Blue Gene/L supercomputer.

Full Text