Fine-grain task aggregation and coordination on GPUs

Marc S Orr,David A Wood,Steven K Reinhardt,Bradford M Beckmann

doi:10.1145/2678373.2665701

Abstract

In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads execut-ing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar opera-tions across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the chan-nel abstraction, which facilitates dynamically aggregating asynchronously produced fine-grain work into coarser-grain tasks. However, no practical implementation has been proposed To this end, we propose and evaluate the first channel im-plementation. To demonstrate the utility of channels, we present a case study that maps the fine-grain, recursive task spawning in the Cilk programming language to channels by representing it as a flow graph. To support data-parallel recursion in bounded memory, we propose a hardware mechanism that allows wavefronts to yield their execution resources. Through channels and wavefront yield, we im-plement four Cilk benchmarks. We show that Cilk can scale with the GPU architecture, achieving speedups of as much as 4.3x on eight compute units

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fine-grain task aggregation and coordination on GPUs

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News

Lead the way for us

Journal: ACM SIGARCH Computer Architecture News	Publication Date: Jun 14, 2014
Citations: 53

Similar Papers

Fine-grain task aggregation and coordination on GPUs
Marc S Orr ... Bradford M Beckmann
-
Marc S Orr, et. al.Marc S Orr ... Bradford M Beckmann
01 Jun 2014
01 Jun 2014

Unified on-chip memory allocation for SIMT architecture
Ari B Hayes ... Eddy Z Zhang
-
Ari B Hayes, et. al.Ari B Hayes ... Eddy Z Zhang
10 Jun 2014
10 Jun 2014

Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling
Licheng Yu ... Xingsheng Tang
Journal of Systems Architecture | VOL. 60
Licheng Yu, et. al.Licheng Yu ... Xingsheng Tang
27 Nov 2013
Journal of Systems Architecture | VOL. 60

Synthetic transmit aperture technique in medical ultrasound imaging implemented on a GPU
Daoyin Yu ... Yi Wang
-
Daoyin Yu, et. al.Daoyin Yu ... Yi Wang
13 Nov 2014
13 Nov 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fine-grain task aggregation and coordination on GPUs

Abstract

Talk to us

Similar Papers

More From: ACM SIGARCH Computer Architecture News