Branch and data herding

John Sartori,Rakesh Kumar

doi:10.1145/2370816.2370879

Abstract

Control and memory divergence between threads in the same execution bundle, or warp, can significantly throttle the performance of GPU applications. We exploit the observation that many GPU applications exhibit error tolerance to propose branch and data herding. Branch herding eliminates control divergence by forcing all threads in a warp to take the same control path. Data herding eliminates memory divergence by forcing each thread in a warp to load from the same memory block. To safely and efficiently support branch and data herding, we propose a static analysis and compiler framework to prevent exceptions when control and data errors are introduced, a profiling framework that aims to maximize performance while maintaining acceptable output quality, and hardware optimizations to improve the performance benefits of exploiting error tolerance through branch and data herding. Our software implementation of branch herding on NVIDIA GeForce GTX 480 improves performance by up to 34% (13%, on average) for a suite of NVIDIA CUDA SDK and Parboil benchmarks. Our hardware implementation of branch herding improves performance by up to 55% (30%, on average). Data herding improves performance by up to 32% (25%, on average). Observed output quality degradation is minimal for several applications that exhibit error tolerance, especially for visual computing applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Branch and data herding

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications
John Sartori ... Rakesh Kumar
IEEE Transactions on Multimedia | VOL. 15
John Sartori, et. al.John Sartori ... Rakesh Kumar
01 Feb 2013
IEEE Transactions on Multimedia | VOL. 15

Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol
Juan Fang ... Xiaoting Hao
-
Juan Fang, et. al.Juan Fang ... Xiaoting Hao
01 Jan 2017
01 Jan 2017

Managing GPU Concurrency in Heterogeneous Architectures
Onur Kayiran ... Rachata Ausavarungnirun
-
Onur Kayiran, et. al.Onur Kayiran ... Rachata Ausavarungnirun
01 Dec 2014
01 Dec 2014

A variable warp size architecture
Timothy G Rogers ... Daniel R Johnson
ACM SIGARCH Computer Architecture News | VOL. 43
Timothy G Rogers, et. al.Timothy G Rogers ... Daniel R Johnson
13 Jun 2015
ACM SIGARCH Computer Architecture News | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Branch and data herding

Abstract

Talk to us

Similar Papers