Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures

Bruno Raffin,João V.F Lima,Nicolas Maillard,Vincent Danjean,Thierry Gautier

doi:10.1016/j.parco.2015.03.001

Abstract

We evaluated four scheduling strategies for multi-CPU and multi-GPU architectures.We designed a framework with performance models for task and transfer prediction.Work stealing is efficient with task annotations and data locality heuristics.HEFT cost model performs better on very regular computations. In this paper, we present a comparison of scheduling strategies for heterogeneous multi-CPU and multi-GPU architectures. We designed and evaluated four scheduling strategies on top of XKaapi runtime: work stealing, data-aware work stealing, locality-aware work stealing, and Heterogeneous Earliest-Finish-Time (HEFT). On a heterogeneous architecture with 12 CPUs and 8 GPUs, we analysed our scheduling strategies with four benchmarks: a BLAS-1 AXPY vector operation, a Jacobi 2D iterative computation, and two linear algebra algorithms Cholesky and LU. We conclude that the use of work stealing may be efficient if task annotations are given along with a data locality strategy. Furthermore, our experimental results suggests that HEFT scheduling performs better on applications with very regular computations and low data locality.

Full Text