Abstract
Heterogeneous systems which are composed of multiple CPUs and GPUs are more and more attractive as platforms for high performance computing. With the evolution of general purpose computation on GPU (GPGPU) and corresponding programming frameworks (OpenCL and CUDA), more applications are using GPUs as a co-processor to achieve performance that could not be accomplished using just the traditional processors. However, the main problem is identifying which task or job should be allocated to a particular device. The problem is even complicated due to the dissimilar computational power of the CPU and the GPU. In this work we propose a new scheduling strategy WT_DMDA which aims to optimise the performance of the preconditioned conjugate gradient solver, in CPU-GPU heterogeneous environment. We use StarPU runtime system to assess the efficiency of the approach on a computational platform consisting of three NVIDIA Fermi GPUs and 12 Intel CPUs. We show that important speedups (up to 5.13×) may be reached (relatively to default scheduler of StarPU) when processing large matrices and that the performance is advantageous when changing the granularity of tasks. An analysis and evaluation of these results is discussed.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have