Performance improvement of CUDA applications by reducing CPU-GPU data transfer overhead

N V Sunitha,Niranjan N Chiplunkar,K Raju

doi:10.1109/icicct.2017.7975190

N V Sunitha, Niranjan N Chiplunkar + Show 1 more

https://doi.org/10.1109/icicct.2017.7975190

Copy DOI

Export

Save

Cite

Publication Date: Mar 1, 2017

Citations: 13

Abstract
Full-Text
Similar Papers

Abstract

Listen

In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication overhead can be hidden by overlapping the data transfer and the kernel execution. CUDA streams provide a means for overlapping data transfer and the kernel execution. In this paper we explore the effects of overlapping data transfer and the kernel execution on overall execution time of some CUDA applications. The results show that the usage of the different levels of concurrency supported by the streams enhances the performance of the CUDA applications.

Full Text