KSM: Online Application-Level Performance Slowdown Prediction for Spatial Multitasking GPGPU

Wenyi Zhao,Minyi Guo,Quan Chen

doi:10.1109/lca.2018.2851207

Abstract

Colocating multiple applications on the same spatial multitasking GPGPU improves the system-wide throughput. However, the colocated applications are slowed down differently due to the contention on streaming multiprocessors (SMs), L2 cache and global memory bandwidth. The ability to precisely predict application slowdowns online is useful in many scenarios, e.g., ensuring fair pricing in multi-tenant Cloud systems. Prior work on predicting application slowdown is either inaccurate, due to the ignoring of contention on SMs, or inefficient, due to the expensive sequential profiling of concurrent applications via runtime environment switching. To solve the above problem, we propose KSM that enables precise and efficient application-level slowdown prediction without priori application knowledge. KSM is proposed based on the observation that hardware event statistics caused by the colocated applications are strongly correlated with their slowdowns. In more detail, KSM builds a slowdown model based on the hardware event statistics using machine learning techniques offline. At runtime, KSM collects the hardware event statistics, and predicts the slowdowns of all the colocated applications based on the model. Our experimental results show that KSM has negligible runtime overhead and precisely predicts the application-level slowdowns with the prediction error smaller than 9.7 percent.

Full Text