Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments

Xuedong Zhang,Kenli Li,Xiantao Zhang,Zhuo Tang

doi:10.1109/tpds.2022.3208082

Abstract

The high concurrency and high throughput characteristics of graphics processing units (GPUs) have made researchers continue to use it to optimize distributed parallel computing architectures. With the upgrading of processor architecture, GPUs allow multiple kernels to execute concurrently through stream queues. However, due to the different hardware characteristics and kernel properties in distributed architectures, existing research lacks careful consideration of optimization schemes for concurrent streams and kernel block sizes. Unreasonable stream concurrency and kernel block size configuration will lead to prolonged execution time and waste of computing resources during application execution. Therefore, we propose a multi-GPU multi-stream co-concurrency mechanism (MGSC) in a distributed heterogeneous environment, dynamically adjusting the number of concurrent streams and exploring the optimal block size in task scheduling. According to the memory resources and startup overhead occupied in concurrent stream scheduling, a resource-aware concurrent stream adaptive adjustment mechanism is proposed, which can dynamically adjust the number of streams. To explore the optimal block size, we abstract it as a multi-armed bandit problem (MAB) and propose a block size adjustment algorithm based on the upper confidence bound (UCB). We implement MGSC in Spark 3.1.1 and NVIDIA CUDA 11.2. We conduct comparative experiments with multiple typical benchmarks to evaluate the performance of MGSC. The experimental results show that the algorithm can make full use of the computing power of the GPU and significantly reduce the execution time of tasks.

Full Text