Optimizing Cache Bypassing and Warp Scheduling for GPUs

Yun Liang,Guangyu Sun,Tao Wang,Yu Wang,Xiaolong Xie

doi:10.1109/tcad.2017.2764886

Abstract

The massive parallel architecture enables graphics processing units (GPUs) to boost performance for a wide range of applications. Initially, GPUs only employ scratchpad memory as on-chip memory. Recently, to broaden the scope of applications that can be accelerated by GPUs, GPU vendors have used caches as on-chip memory in the new generations of GPUs. Unfortunately, GPU caches face many performance challenges that arise due to the excessive thread contention for cache resource. Cache bypassing, where the memory requests can selectively bypass the cache, is one of the solutions that can help to mitigate the cache resource contention problem. In this paper, we propose coordinated static and dynamic cache bypassing to improve the GPU application performance. At compile-time, we identify the global loads that indicate strong preferences for caching or bypassing and encode the classification into the application binary. For the rest global loads, our dynamic cache bypassing has the flexibility to cache only a fraction of threads. In addition to coordinated bypassing, we also develop a bypass-aware warp scheduler to adaptively adjust the scheduling policy based on the cache performance. Evaluations show that our coordinated static and dynamic cache bypassing technique achieves up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.28\boldsymbol \times $ </tex-math></inline-formula> (average <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.32\boldsymbol \times $ </tex-math></inline-formula> ) performance speedup for a variety of GPU applications. When we combine the coordinated cache bypassing with the bypass-aware scheduler, the average speedup is further improved to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.38\boldsymbol \times $ </tex-math></inline-formula> .

Full Text