Influence of Tasks Duration Variability on Task-Based Runtime Schedulers

Olivier Beaumont,Lionel Eyraud-Dubois,Yihong Gao

doi:10.1109/ipdpsw.2019.00013

Abstract

In the context of HPC platforms, individual nodes nowadays consist of heterogenous processing resources such as GPU units and multicores. Those resources share communication and storage resources, inducing complex co-scheduling effects, and making it hard to predict the exact duration of a task or of a communication. To cope with these issues, runtime dynamic schedulers such as starpu have been developed. These systems base their decisions at runtime on the state of the platform and possibly on static priorities of tasks computed offline. In this paper, our goal is to quantify performance variability in the context of HPC heterogeneous nodes, by focusing on very regular dense linear algebra kernels, such as Cholesky and LU factorizations. We therefore first concentrate on the evaluation of the individual block-size kernels variability. Then, we analyze the impact of this variability at the scale of a full application on a dynamic runtime scheduler such as starpu, in order to analyze whether the strategies that have been designed in the context of MapReduce applications to cope with stragglers could be transferred to HPC systems, or if the dynamic nature of runtime schedulers is enough to cope with actual performance variations, even in presence of task dependencies.

Full Text