Microarchitectural performance characterization of irregular GPU kernels

Molly A O'Neil,Martin Burtscher

doi:10.1109/iiswc.2014.6983052

Abstract

GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of a suite of optimized irregular CUDA applications on a cycle-accurate GPU simulator. We characterize the performance bottlenecks in each program and connect source code with microarchitectural characteristics. We also assess the impact of improvements in cache and DRAM bandwidth and latency and discuss the implications for GPU architecture design. We find that, while irregular graph codes exhibit significantly more underutilized execution cycles due to branch divergence, load imbalance, and synchronization overhead than regular programs, these factors contribute less to performance degradation than we expected. It appears that code optimizations are often able to effectively address these performance hurdles. Insufficient bandwidth and long memory latency are the biggest limiters of performance. Surprisingly, we find that applications with irregular memory access patterns are more sensitive to changes in L2 latency and bandwidth than DRAM latency and bandwidth.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Microarchitectural performance characterization of irregular GPU kernels

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

DSPatch
Rahul Bera ... Anant V Nori
-
Rahul Bera, et. al.Rahul Bera ... Anant V Nori
12 Oct 2019
12 Oct 2019

APMC
Tassadaq Hussain ... Adrián Cristal
-
Tassadaq Hussain, et. al.Tassadaq Hussain ... Adrián Cristal
26 Feb 2014
26 Feb 2014

Optimizing Indirect Memory References with milk
Vladimir Kiriansky ... Yunming Zhang
-
Vladimir Kiriansky, et. al.Vladimir Kiriansky ... Yunming Zhang
11 Sep 2016
11 Sep 2016

DRAM Bandwidth and Latency Stacks: Visualizing DRAM Bottlenecks
Stijn Eyerman ... Ibrahim Hur
-
Stijn Eyerman, et. al.Stijn Eyerman ... Ibrahim Hur
01 May 2022
01 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Microarchitectural performance characterization of irregular GPU kernels

Abstract

Talk to us

Similar Papers