Abstract

SummaryGraphics processing units (GPUs) have become an integral part of high‐performance computing to achieve an exascale performance. Understanding and estimating GPU performance is crucial for developers to design performance‐driven as well as energy‐efficient applications for a given architecture. This work presents a model developed using a static analysis of CUDA code to predict the execution time of NVIDIA GPU kernels without the need for running it. Here a PTX code is statically analyzed to extract instruction features, control flow, and data dependence. We propose a scheduling algorithm that satisfies resource reservation constraints to schedule these instructions in threads across streaming multiprocessors (SMs). We use dynamic analysis to build a set of memory access penalty models and use these models in conjunction with the scheduling information to estimate the execution time of the code. We present the experimental results which support that this approach works across architectures of NVIDIA GPUs. We first tested our model on two Kepler machines, where the mean percentage error (MPE)/mean absolute percentage error (MAPE) was 8.88%/28.3% for Tesla K20 and 5.66%/29.4% for Quadro K4200. We further tested the model on Maxwell and Pascal architectures and recorded the MPEs/MAPEs to be 10.64%/47.8% and %/28.5%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call