PERFORMANCE ENHANCEMENT OF CUDA APPLICATIONS BY OVERLAPPING DATA TRANSFER AND KERNEL EXECUTION

K Raju,Niranjan N Chiplunkar

doi:10.35784/acs-2021-17

K Raju, Niranjan N Chiplunkar

Open Access

PDF Available

https://doi.org/10.35784/acs-2021-17

Copy DOI

Export

Save

Cite

Journal: Applied Computer Science	Publication Date: Sep 30, 2021
Citations: 1	License type: cc-by

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The CPU-GPU combination is a widely used heterogeneous computing system in which the CPU and GPU have different address spaces. Since the GPU cannot directly access the CPU memory, prior to invoking the GPU function the input data must be available on the GPU memory. On completion of GPU function, the results of computation are transferred to CPU memory. The CPU-GPU data transfer happens through PCI-Express bus. The PCI-E bandwidth is much lesser than that of GPU memory. The speed at which the data is transferred is limited by the PCI-E bandwidth. Hence, the PCI-E acts as a performance bottleneck. In this paper two approaches are discussed to minimize the overhead of data transfer, namely, performing the data transfer while the GPU function is being executed and reducing the amount of data to be transferred to GPU. The effectiveness of these approaches on the execution time of a set of CUDA applications is realized using CUDA streams. The results of our experiments show that the execution time of applications can be minimized with the proposed approaches.

Full Text