Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing

Tao Li,Yifeng Wang,Xiaoli Gong,Yulu Yang,Qiankun Dong

doi:10.1007/s00500-017-2795-0

Tao Li, Yifeng Wang + Show 3 more

https://doi.org/10.1007/s00500-017-2795-0

Copy DOI

Export

Save

Cite

Journal: Soft Computing	Publication Date: Sep 6, 2017
Citations: 7

Affiliation: Nankai University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Accelerators such as GPUs have become popular general-purpose computing device in the field of high-performance computing. With the boosting ability of storage and computation, it is very important to solve the complex scientific and engineering problems on CPU–GPU heterogeneous system in the big data era. Now the compute-intensive problems have been successfully solved using CPU–GPU cooperative computing. However, it is difficult to handle large-scale data-intensive problems, especially for those limited by GPU device memory. In this paper, the dual buffer rotation four-stage pipeline (DBFP) mechanism is proposed for CPU–GPU cooperative computation to efficiently handle data-intensive problems, which need larger memory than that of a single GPU. The data block partition-based pipeline computing strategy is designed on top of the DBFP mechanism. On the one hand, it breaks out the bottleneck of limited GPU device memory. On the other hand, it explores high-performance computing of CPU and GPU with data transfer and computation overlap. Furthermore, it is easy to extend the DBFP mechanism on the heterogeneous system equipped with multiple GPUs and achieve high resource utilization. The results show that it can achieve 99 and 90% of theoretical performance for dense general matrix multiplication on one GPU and two GPUs respectively with Nvidia GTX480 or K40 GPUs. It also enables K-means and T-nearest-neighbor algorithms to process larger datasets, which used to be limited by the GPU device memory. We achieve nearly 1.9-fold performance gains by dynamic task scheduling on two GPUs when the performance bottleneck is GPU computing or data transmission.

Full Text