Abstract
Heterogeneous systems are the core architecture of most computing systems, from high-performance computing nodes to embedded devices, due to their excellent performance and energy efficiency. Efficiently programming these systems has become a major challenge due to the complexity of their architectures and the efforts required to provide them with co-execution capabilities that can fully exploit the applications. There are many proposals to simplify the programming and management of acceleration devices and multi-core CPUs. However, in many cases, portability and ease of use compromise the efficiency of different devices—even more so when co-executing. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using two heterogeneous systems composed of an integrated GPU and CPU. Static and dynamic load balancers are integrated and evaluated, highlighting single and co-execution strategies and the most significant key points of this promising technology. Experimental results show that co-execution is worthwhile when using dynamic algorithms and improves the efficiency even further when using unified shared memory.
Highlights
In recent years, with the quest to constantly improve the performance and energy efficiency of computing systems, together with the diversity of architectures and computing devices, it has become possible to exploit an interesting variety of problems due to heterogeneous systems
We propose to provide oneAPI with mechanisms that allow the implementation of co-execution without additional effort for the programmer
The proposed Coexecutor Runtime is built on top of oneAPI as a runtime library to allow the parallel exploitation of the CPU along with multiple hardware accelerators that facilitate the implementation of workload balancing algorithms
Summary
With the quest to constantly improve the performance and energy efficiency of computing systems, together with the diversity of architectures and computing devices, it has become possible to exploit an interesting variety of problems due to heterogeneous systems. The oneAPI’s cross-architecture language Data Parallel C++ (DPC++) [25], based on SYCL standard for heterogeneous programming in C++, provides a single, unified open development model for productive heterogeneous programming and cross-vendor support. It allows code reuse across hardware targets while permitting custom tuning for a specific accelerator. This article addresses a new challenge in improving the usability and exploitation of heterogeneous systems, providing oneAPI with the capacity for co-execution This is defined as the collaboration of all the devices in the system (including the CPU) to execute a single massively data-parallel kernel [14,26,27,28].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.