Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

Aaron Thomas Cherian,Xiaozhu Meng,Keren Zhou,John Mellor-Crummey,Dejan Grubisic

doi:10.1109/protools54808.2021.00009

Abstract

Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy’s forthcoming exascale systems. Since the execution model for GPUs differs from that for conventional processors, applications need to be rewritten to exploit GPU parallelism. Performance tools are needed for such GPU-accelerated systems to help developers assess how well applications offload computation onto GPUs.In this paper, we describe extensions to Rice University’s HPC-Toolkit performance tools that support measurement and analysis of Intel’s DPC++ programming model for GPU-accelerated systems atop an implementation of the industry-standard OpenCL framework for heterogeneous parallelism on Intel GPUs. HPCToolkit supports three techniques for performance analysis of programs atop OpenCL on Intel GPUs. First, HPC-Toolkit supports profiling and tracing of OpenCL kernels. Second, HPCToolkit supports CPU-GPU blame shifting for OpenCL kernel executions—a profiling technique that can identify code that executes on one or more CPUs while GPUs are idle. Third, HPCToolkit supports fine-grained measurement, analysis, and attribution of performance metrics to OpenCL GPU kernels, including instruction counts, execution latency, and SIMD waste. The paper describes these capabilities and then illustrates their application in case studies with two applications that offload computations onto Intel GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications
Keren Zhou ... Ryuichi Sai
IEEE Transactions on Parallel and Distributed Systems | VOL. 33
Keren Zhou, et. al.Keren Zhou ... Ryuichi Sai
01 Apr 2022
IEEE Transactions on Parallel and Distributed Systems | VOL. 33

Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems
Can-Qun Yang ... Qiang Wu
Journal of Central South University | VOL. 20
Can-Qun Yang, et. al.Can-Qun Yang ... Qiang Wu
01 Jun 2013
Journal of Central South University | VOL. 20

Catalog of databases and reports
M.D Burtis
-
M.D BurtisM.D Burtis
01 Apr 1997
01 Apr 1997

Catalog of databases and reports
M.D Burtis
-
M.D BurtisM.D Burtis
01 Apr 1996
01 Apr 1996

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

Abstract

Talk to us

Similar Papers