Efficient Utilization of a CPU-GPU Cluster

Gopal Patnaik,Douglas Schwer,Keith Obenschain,Andrew Corrigan,David Fyfe

doi:10.2514/6.2012-563

Abstract

This paper will investigate the performance of a mixture of central processing unit (CPU) and graphical processing unit (GPU) codes on a multi-CPU, multi-GPU cluster. This cluster attempts to balance IO, GPU, and CPU performance to accommodate a wide variety of codes. When designing this cluster, the design goal of a balanced system was one of many options that could have been taken. The GPU, is essentially a video graphics card, found in every desktop or laptop computer. High-end graphics cards such as those used by a computer gamer are capable of extremely high floating point performance. The GPU utilizes the CPU to initialize the GPU, to transfer data from memory/storage to and from the GPU, and to launch the computation kernels that run on the GPU. The Jet Engine Noise Reduction (JENRE) code implements a compressible flow solver which is under development for the simulation of supersonic jet flow and its acoustic properties. A major emphasis of this code's development is on ensuring that the code is capable of fully exploiting emerging massively parallel, high-performance computing architectures for either GPUs or multi-core CPUs. The JENRE code's performance using GPUs is currently 2.1 times that with CPUs, and thus is run typically on the GPUs in the cluster. The cluster is also used for a variety of MPI-based jobs as well as single node OpenMP shared-memory jobs. These jobs utilize the CPU only, and the GPUs are left idle. Typically, a user requests that an entire node (or set of nodes) is allocated to a single job (CPU or GPU) so that there is no contention for resources with other jobs. Since jobs are either CPU or GPU-based, this leads to significant under-utilization of the computational resources. This paper will examine the overall utilization of the cluster and performance of a mix of CPU codes with the GPU-based JENRE code running simultaneously on the same nodes of the cluster. Results indicate that careful and cooperative scheduling of jobs can result in a tripling of the computational capability of the cluster.

Full Text