Predicting and reining in application-level slowdown on spatial multitasking GPUs

Mengze Wei,Wenyi Zhao,Quan Chen,Hao Dai,Jingwen Leng,Chao Li,Wenli Zheng,Minyi Guo

doi:10.1016/j.jpdc.2020.03.009

Abstract

Predicting performance degradation of a GPU application at co-location on a spatial multitasking GPU without prior application knowledge is essential in public Clouds. Prior work mainly targets CPU co-location, and is inaccurate and/or inefficient for predicting performance of applications at co-location on spatial multitasking GPUs. Our investigation shows that hardware event statistics caused by co-located applications strongly correlate with their slowdowns. Based on this observation, we present Themis with a kernel slowdown model (Themis-KSM), which performs precise and efficient online application slowdown prediction without prior application knowledge. The kernel slowdown model is trained offline. When new applications co-run, Themis-KSM collects event statistics and predicts their slowdowns simultaneously. In addition, we also propose a two-stage slowdown prediction mechanism (Themis-TSP) for real-system GPUs without any hardware modification. Our evaluation shows that Themis has negligible runtime overhead, and both Themis-KSM and Themis-TSP can precisely predict application-level slowdown with prediction error smaller than 9.5% and 12.8%, respectively. Based on Themis, we also implement an SM allocation engine to rein in application slowdown at co-location. Case studies show that the engine successfully enforces fair sharing and QoS.

Full Text