A framework for scheduling dependent programs on GPU architectures

Yuan-Ming Chang,Shao-Chung Wang,Yuan-Shin Hwang,Chun-Chieh Yang,Wei-Cheng Liao

doi:10.1016/j.sysarc.2020.101712

Abstract

Abstract In recent years, the parallel computing performance on GPUs (graphics processing units) has grown rapidly. As a result, GPUs have been widely applied in computationally intensive applications such as image processing, deep learning, and artificial intelligence. Because these applications can be modeled by multiple GPU kernels, some of which might even be dependent, it is essential to identify an efficient method for scheduling dependent kernels on GPU cores. Simply observing kernel dependencies by executing them in sequence results in performance degradation. Furthermore, dependent kernels generally need to share data. Consequently, without properly scheduling dependent kernels, unnecessary memory accesses and copies will be generated. Neural network model environments include many operators that are suitable for both parallel and dependent kernels. This paper proposes an efficient and clear method for creating a framework to analyze the ONNX (open neural network exchange) model and find the pattern of dependent kernels, which can then be used for scheduling with the GPU architecture. The preliminary experimental results show that this technique improves the overall performance by 8% and reduces the cache miss rate by 14% on average by combining neural network operators and appropriate memory policies.

Full Text