Overlapping Data Transfers with Computation on GPU with Tiles

Burak Bastem,John Shalf,Weiqun Zhang,Didem Unat,Ann Almgren

doi:10.1109/icpp.2017.26

Abstract

GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking advantage of GPUs. We propose a tiling based programming model and its library that simplifies the development of GPU programs and overlaps the data movement with computation. The programming model decomposes the data and computation into tiles and treats them as the main data transfer and execution units, which enables pipelining the transfers to hide the transfer latency. Moreover, partitioning application data into tiles allows the programmer to still take advantage of GPU even though application data cannot fit into the device memory. The library leverages C++ lambda functions, OpenACC directives, CUDA streams and tiling API from TiDA to support both productivity and performance. We show the performance of the library on a data transfer-intensive and a compute-intensive kernels and compare its speedup against OpenACC and CUDA. The results indicate that the library can hide the transfer latency, handle the cases where there is no sufficient device memory, and achieves reasonable performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Overlapping Data Transfers with Computation on GPU with Tiles

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Performance improvement of CUDA applications by reducing CPU-GPU data transfer overhead
N V Sunitha ... Niranjan N Chiplunkar
-
N V Sunitha, et. al.N V Sunitha ... Niranjan N Chiplunkar
01 Mar 2017
01 Mar 2017

Base64 Encoding on Heterogeneous Computing Platforms
Zheming Jin ... Hal Finkel
-
Zheming Jin, et. al.Zheming Jin ... Hal Finkel
01 Jul 2019
01 Jul 2019

An Efficient Method for Incremental Learning of GMM Using CUDA
Chunlei Chen ... Dejun Mu
-
Chunlei Chen, et. al.Chunlei Chen ... Dejun Mu
01 Aug 2012
01 Aug 2012

Heterogeneous Multicore Architecture
Kunio Uchiyama ... Yasuhiro Tawara
-
Kunio Uchiyama, et. al.Kunio Uchiyama ... Yasuhiro Tawara
01 Jan 2012
01 Jan 2012

Publication Date: Aug 1, 2017
Citations: 34	License type: mit

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overlapping Data Transfers with Computation on GPU with Tiles

Abstract

Talk to us

Similar Papers