POSTER

Guray Ozen,Jesus Labarta,Eduard Ayguade

doi:10.1145/2967938.2974056

Abstract

Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or nested) parallelism is supported, making possible to launch kernels from threads running on the device, without host intervention. Unfortunately, the overhead of launching kernels from the device is higher compared to launching from the host CPU, making the exploitation of dynamic parallelism unprofitable. This paper proposes and evaluates the basic idea behind a user-directed code transformation technique, named collective dynamic parallelism, that targets the effective exploitation of nested parallelism in modern GPUs. The technique dynamically packs dynamic parallelism kernel invocations and postpones their execution until a bunch of them are available. We show that for sparse matrix vector multiplication, CollectiveDP outperforms well optimized libraries, making GPU useful when matrices are highly irregular.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

POSTER

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy
Lena Oden ... Holger Fröning
Parallel Computing | VOL. 57
Lena Oden, et. al.Lena Oden ... Holger Fröning
29 Mar 2016
Parallel Computing | VOL. 57

Improving the Performance of the CamShift Algorithm Using Dynamic Parallelism on GPU
Yun Tian ... Yanqing Ji
-
Yun Tian, et. al.Yun Tian ... Yanqing Ji
18 Jul 2017
18 Jul 2017

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture
Mykola Semylitko ... Gennadii Malaschonok
NaUKMA Research Papers. Computer Science | VOL. 4
Mykola Semylitko, et. al.Mykola Semylitko ... Gennadii Malaschonok
10 Dec 2021
NaUKMA Research Papers. Computer Science | VOL. 4

SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs
Thaha Muhammed ... Iyad Katib
Applied Sciences | VOL. 9
Thaha Muhammed, et. al.Thaha Muhammed ... Iyad Katib
06 Mar 2019
Applied Sciences | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

POSTER

Abstract

Talk to us

Similar Papers