Abstract

Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000 times faster compared to the system evaluation by program execution.

Highlights

  • Accelerators are often used collaboratively with general-purpose CPUs to increase the overall system performance and improve energy efficiency

  • The optimization process involves generation of system configuration based on random selection of parameter values and the system performance evaluation using a machine learning model

  • Optimal work-sharing among available CPUs and accelerators is not obvious

Read more

Summary

Introduction

Accelerators (such as, GPU or Intel Xeon Phi) are often used collaboratively with general-purpose CPUs to increase the overall system performance and improve energy efficiency. Due to different architectural characteristics and the large number of system parameter configurations (such as, the number of threads, thread affinity, workload partitioning between multi-core processors of the host and the accelerating devices), achieving a good workload distribution that results in optimal performance and energy efficiency on heterogeneous systems is a non-trivial task [5,6]. We describe heterogeneous computing systems and applications that we use in this paper to illustrate and evaluate our approach. Ida comprises two Intel Xeon E5-2650 v4 general purpose CPUs on the host, and one GeForce GTX Titan X GPU as accelerator. The GPU device has 24 Streaming Multiprocessors (SM), and in total 3072 CUDA cores running at base frequency of 1 GHz

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.