GPA: A GPU Performance Advisor Based on Instruction Sampling

Keren Zhou,Ryuichi Sai,John Mellor-Crummey,Xiaozhu Meng

doi:10.1109/cgo51591.2021.9370339

Abstract

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimizations at a hierarchy of levels, including individual lines, loops, and functions. To relieve users of the burden of interpreting performance counters and analyzing bottlenecks, GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with optimization strategies. To quantify the potential benefits of each optimization strategy, we developed PC sampling-based performance models to estimate its speedup. Our experiments with benchmarks and applications show that GPA provides insightful reports to guide performance optimization. Using GPA, we obtained speedups on a Volta V100 GPU ranging from 1.01 x to 3.58 ×, with a geometric mean of 1.22 x.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GPA: A GPU Performance Advisor Based on Instruction Sampling

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A comparative study of GPU programming models and architectures using neural networks
Vivek K Pallipuram ... Melissa C Smith
The Journal of Supercomputing | VOL. 61
Vivek K Pallipuram, et. al.Vivek K Pallipuram ... Melissa C Smith
31 May 2011
The Journal of Supercomputing | VOL. 61

Starlight: A kernel optimizer for GPU processing
Alberto Zeni ... Marco D Santambrogio
Journal of Parallel and Distributed Computing | VOL. 187
Alberto Zeni, et. al.Alberto Zeni ... Marco D Santambrogio
22 Dec 2023
Journal of Parallel and Distributed Computing | VOL. 187

An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications
Keren Zhou ... John Mellor-Crummey
IEEE Transactions on Parallel and Distributed Systems | VOL. 33
Keren Zhou, et. al.Keren Zhou ... John Mellor-Crummey
01 Apr 2022
IEEE Transactions on Parallel and Distributed Systems | VOL. 33

A Multi-Level Platform-Independent GPU API for High-Level Programming Models
Akihiro Hayashi ... Sri Raj Paul
-
Akihiro Hayashi, et. al.Akihiro Hayashi ... Sri Raj Paul
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GPA: A GPU Performance Advisor Based on Instruction Sampling

Abstract

Talk to us

Similar Papers