Abstract

We present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system. We have also revised the smart data containers’ memory consistency model for automatic data sharing between main and device memory. The new features are the result of a two-year co-design effort collecting feedback from HPC application partners in the EU H2020 project EXA2PRO, and target especially the HPC application domain and HPC platforms. We evaluate the performance effects of the new features on high-end multicore CPU and GPU systems and on HPC clusters.

Highlights

  • The recently observed slowdown of Moore’s Law implies, for the foreseeable future, that further performance growth in high-performance computing (HPC) critically depends on efficiently utilizing hardware resources, leveraging even more heterogeneity in the form of accelerators such as GPUs and scaling up to even higher degrees of cluster-level parallelism

  • The resulting third generation of SkePU presented in this paper is the result of a twoyear co-design effort taking into account the feedback from HPC application partners in the EU H2020 project EXA2PRO, and aims at striking a good balance between improved programmability for HPC applications and decent performance and scalability on HPC platforms while keeping the strict portability approach of SkePU

  • The main new features include new skeletons, new data container types for multidimensional data and scalable data movement at distributed execution, support for returning multiple objects from skeleton instances and user functions, support for specifying optional, platform-specific variants of user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system

Read more

Summary

Introduction

The recently observed slowdown of Moore’s Law implies, for the foreseeable future, that further performance growth in high-performance computing (HPC) critically depends on efficiently utilizing hardware resources, leveraging even more heterogeneity in the form of accelerators such as GPUs and scaling up to even higher degrees of cluster-level parallelism. This leads to programmability and portability issues on. The cost of abstraction might be a certain loss in efficiency compared to explicitly parallel code written by system experts, the abstraction might even lead to higher performance where the better structuring and the knowledge of dependence patterns can enable automated optimizations

A Short History of SkePU
MapPairs
Multi-Valued Return in Map Skeletons
Dynamic Scheduling with OpenMP Backends
Revised Syntax for MapOverlap
Tensors
MatRow Container Proxy
Related work
Findings
Conclusions and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call