Metric Selection for GPU Kernel Classification

S.-Kazem Shekofteh,Hadi Sadoghi Yazdi,Hamid Noori,Mahmoud Naghibzadeh,Holger Fröning

doi:10.1145/3295690

Abstract

Graphics Processing Units (GPUs) are vastly used for running massively parallel programs. GPU kernels exhibit different behavior at runtime and can usually be classified in a simple form as either “compute-bound” or “memory-bound.” Recent GPUs are capable of concurrently running multiple kernels, which raises the question of how to most appropriately schedule kernels to achieve higher performance. In particular, co-scheduling of compute-bound and memory-bound kernels seems promising. However, its benefits as well as drawbacks must be determined along with which kernels should be selected for a concurrent execution. Classifying kernels can be performed online by instrumentation based on performance counters. This work conducts a thorough analysis of the metrics collected from various benchmarks from Rodinia and CUDA SDK. The goal is to find the minimum number of effective metrics that enables online classification of kernels with a low overhead. This study employs a wrapper-based feature selection method based on the Fisher feature selection criterion. The results of experiments show that to classify kernels with a high accuracy, only three and five metrics are sufficient on a Kepler and a Pascal GPU, respectively. The proposed method is then utilized for a runtime scheduler. The results show an average speedup of 1.18× and 1.1× compared with a serial and a random scheduler, respectively.

Full Text