Abstract

Predicting performance degradation of a GPU application when it is co-located with other applications on a spatial multitasking GPU without prior application knowledge is essential in public Clouds. Prior work mainly targets CPU co-location, and is inaccurate and/or inefficient for predicting performance of applications at co-location on spatial multitasking GPUs. Our investigation shows that hardware event statistics caused by co-located applications, which can be collected with negligible overhead, strongly correlate with their slowdowns. Based on this observation, we present Themis, an online slowdown predictor that can precisely and efficiently predict application slowdown without prior application knowledge. We first train a precise slowdown model offline using hardware event statistics collected from representative co-locations. When new applications co-run, Themis collects event statistics and predicts their slowdowns simultaneously. Our evaluation shows that Themis has negligible runtime overhead and can precisely predict application-level slowdown with prediction error smaller than 9.5%. Based on Themis, we also implement an SM allocation engine to rein in application slowdown at co-location. Case studies show that the engine successfully enforces fair sharing and QoS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call